

19 


Sbjct: 


16 


Query: 


79 


Sbjct: 


76 


Query: 


139 


Sbjct: 


123 


Query: 


199 


Sbjct: 


182 


Query: 


259 


Sbjct: 


242 


Query: 


319 


Sbjct: 


300 


Query: 


379 


Sbjct: 


353 




439 


Sbjct: 


413 




496 


Sbjct: 


473 
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= 147/531 (27%) , Positives = 261/531 (48%) , Gaps = 26/531 (4%) 



++ G + ++AY QA ++F+ R ++L Y E I +STKQG+ 4 



G N+ NIFDEVHTY +D + VN GS +K+ NW + YI+4- G KRD L+ 



W A+PL+G 



--NKNRTYVGID 352 



EL E ++ +TE+ 



L L Y+ D 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 813 

A DNA sequence (GBSx0862) was identified in S.agalactiae <SEQ ID 2469> which encodes the amino 
acid sequence <SEQ ID 2470>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=Q. 3319 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB41469 GB:L35061 orfL4 [Bacteriophage phi-41] 
Identities = 86/374 (22%) , Positives = 166/374 (43%) , Gaps = 38/374 (10%) 

Query: 12 FARI FRPNNRKSTRTYLQRS I S YWRRNSI YIjDNI YNKI STDTAQLRFKHVKITRNPGGVD 71 

F+R N+ + + ++ Y S ++ NI+NKI+ + ++ F HVK ++ G D 
Sbjct: 10 FSRGKIJSINDTQRVTAWQNEAVEY TSAFVTNIHNKIANEITKVEFNHVKYKKSDVGSD 66 



Query: 72 SMVWYEHSDLAEVLTVSPNPLETOVvWSNvTRAMLRDGVAVWPRW- -KNGRLVEIWLA 129 



WO 02/34771 



PCT/GB01/04789 



Sbjct: 67 TLI SMAGSDLDEVLNWS S KGERNSME FWQKVI KKLLTTRYIDLYPI FDRKTGDLVDLLFA 126 

Query: 130 KKTVTWTAESVELMLDDVAWLPLTDVWFENPKLNVTAQLNQITELIDINLNALTB 189 

+ E + ++ + N4 T 44D L 4 KL 

Sbjct: 127 DNKKEYKPEELVRLISPFYI NEDTSILDNALAGIQTKLE 165 

Query: 190 DGNSSLRGFLKLPT KAADEHLKQQARDRVDSMLDLAKNGGIAYLEQGEEFQELSKDY 246 

G ++G LK+ D+ K +A + +M +++ G+ 4 E Eh KDY 

Sbjct: 166 Q/3K--MKGLLKINAFIDTDNDQEFKDKAMLTIKNMQEMSNYNGLTPTDNKTEIVELKKDY 223 

Query: 247 STASKEELEFLKSQLYNAHGINEKLFTCDYTEEQYRAYYSSVMKLYQRVYSEEINRKYFT 306 

S +K+E++ +KS+L + 4NE + ++EQ 4Y4S 4- 4E4 K + 

Sbjct: 224 SVLNKDEIDLIKSELLTGYFMNENILLGTASQEQQIYFYNSTIIPLLIQLEKELTYKLIS 283 

Query: 307 KTAR--TQGN KLLVFFDMADMISFKDLVEGGFlCSiCYAGLMNSNEFRETYLGLPGYE 360 

R +GN 444V + + K+L++ 44 4 N4 4G E 

Sbjct: 284 TNRRRWKGNLYYERIIVDNQLFKFATLKELIDLYHENINGPIFTQNQLL-VKMGEQPIE 342 

Query: 361 GGEVFEUffiNAVRI 374 

GG4V4 NLNAV 4 
Sbjct: 343 GGDVYIANLNAVAV 356 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 814 

A DNA sequence (GBSx0863) was identified in S.agalactiae <SEQ ID 2471> which encodes the amino 
acid sequence <SEQ ID 2472>. This protein is predicted to be a prohead protease. Analysis of this protein 
sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3496 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT 

>GP:AAF31089 GB:AF069529 protease [Bacteriophage HK97] 
Identities = 52/142 (36%) , Positives = 73/142 (50%) , Gaps = 11/142 (7%) 

Query: 21 FFAYASTYDOTDREGDVMAKGCFDOTLKSKA-WPMCLNHDR-NCVIGKHE-LSVDEKGL 77 

FE YAS 44NTD 4GD44 G F N L 44 V M NH 4GK 4 L4 DEKGL 

Sbjct: 26 FEGYASVFNNTDSDGDIILPGAFKNALANQTRKVAMFFNHKTVffiLPVGKOTSLAEDEKGL 85 

Query: 78 RTRSTFNLSDPEAKKTYDLMKMGALDSLSIGFFI - - KDYEPIDAKQPYGGWI FKEVE -IF 134 

r A M4 G 44 4S4GF 4 DY I G IFK 44 4 

Sbjct: 86 YWGQLTPGHSGAADLKAAMQHGTVEGMSVGFSVAKDDYTIIPT GRIFKNIQALR 140 

Query: 135 EISWTVPANPQATVDNIKEFD 156 

EISV T PAN QA 4 4K D 
Sbjct: 141 EISVCTFPANEQAGIAAMKSVD 162 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 815 

A DNA sequence (GBSx0864) was identified in S.agalactiae <SEQ ID 2473> which encodes the amino 
acid sequence <SEQ ID 2474>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2247 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Mot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10155> which encodes amino acid sequence <SEQ ID 
101 56> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 51 LEQLKTDAESLVSQATA- -IKETIAGLDSDISETEEELSK-AAKIIK EKQK 98 

L +LK + SL SQ +K I L ++E E+ LS+ + 4- 1 IK EK K 

Sbjct: 13 I^LICENNVSLKSQINGFEVKNAIEDLPK-VQELEKTLSENSIEIIKIENELNAQEEKPK 71 

Query: 99 GNTPM-DYLKTKAAALDFraiLMDNEGSANSAl^ 155 

G M ++++++ A +F +L N G + + AW ALE GV T+ T LP ++ 
Sbjct: 72 GKAKMTNFIESQNAVTEFFDVLKKNSGKSE - 1 KNAWNAKLAENGVTITDTTFQLPRKLVE 130 

Query: 156 AIQDAFTMYNGILN- -HVSKDPRYAVRVALQTQVSQAKGHKAGKTKKDEDFTFLDFTINS 213 

+ 1 A N N + HV+ V + + ++A+ HK G+TK ++ T T+ 

Sbjct: 131 SINTALLNTNPVFKVFHVTNVGALLVSRSFDSS-AEAQVHKDGQTKTEQAATLTIDTLEP 189 

Query: 214 ATVY- IKYAFEYSDLKKDTTGAYFNYVMKELAQGFI -RTIERAWIGDGKSN-SAEDKIT 270 

VY ++ E + + +N ++ EL Q + + ++ A+V GDG + + DK 

Sbjct: 190 VMVYKLQSLAERVKRLQMSYSELYNLIVAELTQAIVNKIVDLALVEGDGSNGFKSIDKEA 249 

Query: 271 E1KSIAEET 279 

++K I + T 
Sbjct: 250 DVKKIKKIT 258 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 816 

A DNA sequence (GBSx0865) was identified in S.agalactiae <SEQ ID 2475> which encodes the amino 
acid sequence <SEQ ID 2476>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0. 3068 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 



No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 



-902- 



PCT/GB01/04789 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 817 

A DNA sequence (GBSx0866) was identified in S.agalactiae <SEQ ID 2477> which encodes the amino 
5 acid sequence <SEQ ID 2478>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 0437 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Hot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 818 

A DNA sequence (GBSx0867) was identified in S.agalactiae <SEQ ID 2479> which encodes the amino 
20 acid sequence <SEQ ID 2480>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 3181 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10153> which encodes amino acid sequence <SEQ ID 
30 1 0 1 54> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 819 

A DNA sequence (GBSx0869) was identified in S.agalactiae <SEQ ID 2481> which encodes the amino 
acid sequence <SEQ ID 2482>. This protein is predicted to be a major structural protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 29 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3364 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



PCT/GB01/04789 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA74331 GB-.L33769 unidentified ORF28; putative [Bacteriophage 
bIL67] 

Identities = 55/201 (27%) , Positives = 84/201 (41%) , Gaps = 18/201 (8%) 

Query: 9 EVTHGNflNGF-YAKIAKTDAGflLDLQKPYPFTGLRSTSFETSQESNAYYAD-NVEHVRLQ 66 

E+THG G + + + G P GLR ++ QE+ +YA N + + 

Sbjct: 8 ELTHGLGYGWFTDLTGSKTGI PIAGLRGIETDSKQENKNFYAGFNAPYRTIA 60 

Query: 67 GKKSTEGSITTYQIPKQFMIDHLGKKLTNSTPPALIDTGVOTN-FIVJGYAETVTDEFGAE 125 

G K T+ + +Y +P F LG S L D N + + YAE D+ G 

Sbjct: 61 GAKDTQIKVKSYDLPDDFATHALG---FGSVQGFLTDDVANYKPYGFAYAERYRDDDGTG 117 

Query: 126 IEEFHIVmiVKASAPKGSTSTDETSATPKEIElPCTASPNNFIVDSEKKPVSEIVVJRDDS 185 

+ + +V+A+ P+ DESTKEE T++F+ +K+ + D 
Sbjct: 118 YKA-TFYPSVQATTPSDTAEMEESPTGKEYEHEATVTTGDFTLGDKKRL.FVKFKVSDTE 176 

Query: 186 KGT-VRGK---FDKLFADKSP 202 

T GK F KLF D P 
Sbjct: 177 LATGTSGKALAFKKLFTDLKP 197 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 820 

A DNA sequence (GBSx0870) was identified in S.agalactiae <SEQ ID 2483> which encodes the amino 
acid sequence <SEQ ID 2484>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaxnty=0. 2531 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 821 

A DNA sequence (GBSx0871) was identified in S.agalactiae <SEQ ID 2485> which encodes the amino 
acid sequence <SEQ ID 2486>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 2972 (Affirmative) • 

bacterial membrane Certainty=0 . 0000 (Not Clear) < I 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 822 

A DNA sequence (GBSx0872) was identified in S.agalactiae <SEQ ID 2487> which encodes the amino 
acid sequence <SEQ ID 2488>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3860 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 823 

A DNA sequence (GBSx0873) was identified in S.agalactiae <SEQ ID 2489> which encodes the amino 
acid sequence <SEQ ID 2490>. Analysis of this protein sequence reveals the following: 

Possible site: IS 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.22 Transmembrane 605 - 621 ( 569 - 631) 
INTEGRAL Likelihood = -8.12 Transmembrane 583 - 599 ( 569 - 604) 

Final Results --, 

bacterial membrane Certainty=0 . 6689 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB70053 GB:AF011378 unknown [Bacteriophage ski] 
Identities = 159/709 (22%) , Positives = 285/709 (39%) , Gaps = 112/709 (15%) 

Query: 128 SILNLNKELDNVAKELDIVNQIOLELDPDNVELAEQKMKLLGKQSELAGDK\fQELKKKQAA 187 

S+ +N + + E + L+LDP N + Q K L Q L+ DK +LK++ ++ 
Sbjct: 21 SLKGVNTAMSGLRGEAKNLRDALKLDPTNTDKMAQLQKNLQTQLGLSRDKATKLKQELSS 80 

Query: 188 LGDEK- IGTEEWRQLQNEIGQAEVEVLKIDRAMDILGESSRSATGDI- -KEATSYLRADV 244 

+ G ++W QL ++G AE + +++ + + + S + DI K T + + + 

Sbjct: 81 VDKSSPAGQKKWLQLTRDLGTAETQANRLEGEIKQVEGAISSGSWDIDAKMDTKGVNSGI 140 

Query: 245 MMDVADKAG QIGQKMVDAGKMTVDAWSE I DEALDTVTTKTGLTGD 289 

+ +G QIG V A + W + +A+DT L 

Sbjct: 141 DGMKSRFSGLREIAVGVFRQIGSSAVSAVGNGLKGW--VSDAMDTQKAMISLQNTLKFKG 198 



Query: 290 ALAELQEIAKDIATG MPTSFQNAGD AVGEL -NTQFGLT 326 

+Q +AKD + T+F GD AVG+ N FG T 

Sbjct: 199 NGQDFDYVSKSMQTLAKDTNANTEDTLKLSTTFIGLGDSAKTAVGKTEALVKANQAFGGT 258 

Query: 327 GEKLKSASELL IKYAEINE-TD ISSSAISAKQAIEAYG- -LTAE 367 

GE+LK + + IN+ TD + S+ + A++ YG +A 

Sbjct: 259 GEQLKGWQAYGQMSASGKVSAENINQLTDNNTAK3SALKSTVMEMNPALKQYGSFASAS 318 

Query: 368 DLGMV LDNVTKAAQDTGQSVDTIVQKAIDGAPQIKGLGLSFEEGA ALIGK 417 
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4-G+ LD+ G T+AD + LL A 44l K 

Sbjct: 319 EKGAISVEMLDKAMQKLGGAGGGAVTTIGDAWDSFNETLSLALLPTLDALTPIISSIIDK 378 

Query: 418 FEKSGVDSSAALSSLSKAAVIYAKD- -GKTLTDGLNETVSAIQNSTSET- -EALSIASEI 473 

G 4 AL S4 K Y K+ G +G ++S 14 T LSI ++ 
Sbjct: 379 MAGWGESAGKALDSIVK YVKELWGALEKNGALSSLSKIWDGLKSTFGSVLSIlGQ.il 434 

Query: 474 FGSKAAPRMVDAIQRGAFSFDDLAEAAKSSSGTVSTTFDETLDPIDKLTQYSNQAKEGMA 533 

S A +D+ 4 A + 44 S T44 D I K4 44 4 E 

Sbjct: 435 IESFAG IDS KTGESAGSVENVSKTIANLAKGLADVI KKIADFAKKFSESKG 485 

Query: 534 ELGGKLLETVIPALEPLMGMLESSVNWFTSLNETDQ-QTIVILGLVTTAVMMLLGAIAPL 592 

+ L+T + AL + T+++ + QT + G + AI P 

Sbjct: 486 AID--TLKTSLVALTAGFVAFKIGSGIITAISAFKKLQTAIQAGTGVMGAFNAVMAINPF 543 

Query: 593 VIAIGAIGAPVGIWAAIV-GAIAVITLIIQAIMNWGAITEWLQSTWDSCAA W 644 

V +GI +AAIV G + T W 4 ++L+S WD + W 

Sbjct: 544 VA LGIAIAAIVAGLVYFFTQTETGKKAWASFVDFLKSAWDGIVSFFSGIGQW 595 

Query: 645 LSELWTNIVTTATTAWSNFTAWLSGLWSSWSTGQSLWSSFTSSLSNIFSSLITGAQSLW 704 

+++W V A W W SG+ V Q++W+ T+ + ++++++TG Q+ W 
Sbjct: 596 FADIWNGAVDGAKGIWQGLVDWFSGIVQGV QNIWNGITTFFTTLWTTWTGIQTAW 651 



Query: 705 £ 

4 T + LW G+V+ 4 4F 4SS 44G 4N 44T 4 4 KS 
Sbjct: 652 AGOTGFFTGLWDGIVNWTTVFTTISSLVTGAYNWFVTTFQPLISFYKS 700 



There is also homology to SEQ ID 2492. 

A related GBS gene <SEQ ID 8663> and protein <SEQ ID 8664> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -13.98 
GvH: Signal Score (-7.5): -2.78 

Possible site: 16 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -14.22 threshold: 0.0 

INTEGRAL Likelihood =-14.22 Transmembrane 605 - 621 ( 569 - 631) 

INTEGRAL Likelihood = -8.12 

PERIPHERAL Likelihood = 4.45 
modified ALOM score: 3.34 



*** Reasoning Step: 3 



Final Results 

45 bacterial membrane 

bacterial outside 
bacterial cytoplasm 



--- Certainty=0. 6689 (Affirmative) < suco 
■-- Certainty=0. 0000 (Not Clear) < suco 
•-- Certainty=0. 0000 (Not Clear) 



The protein has homology with the following sequences in the databases: 

50 27.1/51.7% over 981aa 

Bacteriophage ski 
GP|2392838| unknown Insert characterized 

ORF0047K328 - 2976 of 3333) 
55 GP|2392838|gb]AAB70053.l| )AF011378(9 - 990 of 999) unknown {Bacteriophage ski} 

%Match =7.3 

%Identity =27.1 %Similarity = 51.7 

Matches = 164 Mismatches = 275 Conservative Sub.s = 149 



MSINQEEKKTLSNADLLSVMSD*KERRKSMTET?EGLYWFGAOTV3FDRSVKGINTALSSLKKDFNNINRQLKMDPDNV 
: = 1: II =1= hlMibl 1= I h ll = || 1 
f4ASNATFEVEIYGNTTKFEKSLKG\/NTAMSGLRGEAKNLRDALKLDPTNT 
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DLI^KLVlttQEQARVGAIKIAELKKQQKaLGESE-VGSAQtWKLQLEIAKVESQMKIVDKAMESTiaCHIEDVGDPKSIL 
I = = III 1 = I =11:: :: ■ I I :| =| :: |:| :: :: : 

DKI^QLQKKLQTQLGLSRDKATKLKQELSSVDKSSPAGQKKIi«JQLTRDLGTAETQANRLEGEIKQVE 

60 70 80 90 100 110 

1053 1083 1113 1143 1167 1197 1227 

NLWKELD DVMMDVADKAGQIGQKMUDAGKMTVDAWSEIDHMDTVTTKTGLTG--DAIAELQEIAKDIATGMPTSF 

I =: =1 =11 =11 M I = : hill = = =1 

GAISSGSW-DIDAKMDTKGVNSGIDGMKSRFSGLREIAVGVFRQIGSSA 

120 130 140 150 160 



= 1 

VSAVGNGLKGWSDAMDTQKAMISLQOTLKFKGNGQDFDWSKSMQTIAKDTimm'EDTLKLSTTFIGLGDSAKTAVGKT 
180 190 200 210 220 230 240 

1269 1299 1329 1359 1389 1416 1446 1476 

DAVGELOTQFGLTGEKLKSASELLIKyAEINETDISSSAISAJCQAIFAYG-LTAEDLGMW.DNVTKAAQDTGQSVDTIVQ 
=1= = I II MM : I : | | ::||:: : || | 

EALVKANQAFGGTGEQLKGV VQAYGQMSASGKVSAENINQLTDNNT 

260 270 280 290 

1506 1536 1566 1596 1626 1656 1686 1716 

KAIDGAPQ1KGLGLSFEEGAALIGKFEKSGVDSSAALSSLSKAAVIYAKDGKTLTDGLNETVSAIQNSTSETEALSIASE 

I III:::::: ||| 



300 310 

1746 1794 1824 1854 1884 1914 1944 

IFGSKAAPRMVDAIQR GAFSFDDLAEAAKSSSGTVSTTFDETLDPIDKLTQYSNQAKEGMAEIiGGKLIjETVIPALE 

1= = = 1=1= I : : :| I : |:| : III : : | || |:::: :: 

-KGAISVEMLDKAMQKLGGAGGGAVTTIGDAWDSFITOTLSIJ^LPTLDALTPIISSIIDK^GWGESAGKALDSIVKYVK 
330 340 350 360 370 380 390 

1974 2004 2034 2064 

PLMGMLES SVNWFTSLNETDQQTI VI LGLVTTAVMMLLGAI APL 

1111= ::||:: :| | : : |= M = 

ELWGALEKN-GALSSLSKIWDGLKSTFGSVLSIIGQLIESFAGIDSKTGESAGSVENVSKT1AN FKKLQTAIQAGT 

410 420 430 440 450 460 

2082 2112 2139 2169 2199 2238 2268 

VIAIGAIGAPVGIWAAIV-GAIAVITLIIOAIMNVJGAITEVJLQSXITO SCAAWXSELWTNIVTTA 

Ml =11 =1111 Ml M ==l=l II | :::| | | 

GVMGAFmViVIAINPF-VALGIAIAAIVAGLWFFrQTETGKKAWASFVDFLKSAWDGIVSFFSGIGQWFADIWNGAVDGA 
540 550 560 570 580 590 600 

2298 2328 2358 2388 2418 2448 2478 

TTAWSNFTAWLSGLWSSWSTGQSLWSSFTSSLSNIFSSLITGAQSLWSSFTSTLSNLWSGLVSTGSNLFNNLSS 

I ' Ml: I Ml: |: == =: = = ::|l 1= h I :: || |:|: : s| :|| 

KGIWQGLVDWFSG1VQGV QNIVWGITTFFTTLVriTVVTGIQTAWAGVTGFFTGLWDGIVNVVTTVFTTISSLVTGA 

620 630 640 650 660 670 680 

2496 2526 2556 2586 2616 

TISGIFNGILSTASNIWNSIKSTISNAIDGAKNAVSNGVNA 

IMIIII I M= :| = || : : | : ::| : 

YNWFVTTFQPLISFY KNIVSGVFEaFGNFASKATOAITGVFNGIGSFFSDrFGGVKNTIDSVLGGVTDTIiqNIKGS 

870 880 890 900 910 920 

2646 2676 2706 2736 2766 2796 2826 2856 

IKNLFNFQIKWPHIPLPHFRVSGSANPLDWLKGGLPSIGIDWYAKGGI 

I = = III =: I 

I DWVASKVGGLFKGSMWGLTDVN 
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930 940 950 

2886 2916 2946 2976 3006 3036 3066 3096 

LCMGQS IMTITOOTSimiNVNFSGOT I REK^ 
| = | :: :|:| | | |=: || : 

LSSSGYGLSTNSVSSDNRTYNTFNVQGGAGQDVSNLARaiRREFELGRA 
960 970 980 990 

SEQ ID 8664 (GBS58) was expressed in and purified from E.coli as a GST fusion. The purified protein is 
shown in lane 10 of Figure 193. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 824 

A DNA sequence (GBSx0874) was identified in S.agalactiae <SEQ ID 2493> which encodes the amino 
acid sequence <SEQ ID 2494>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2732 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 825 

A DNA sequence (GBSx0875) was identified in S.agalactiae <SEQ ID 2495> which encodes the amino 
acid sequence <SEQ ID 2496>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2467 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10151> which encodes amino acid sequence <SEQ ID 
101 52> was also identified. A further related GBS nucleic acid sequence <SEQ ID 10935> which encodes 
amino acid sequence <SEQ ID 10936> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2497> which encodes the amino acid 
sequence <SEQ ID 2498>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
• Final Results 
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bacterial cytoplasm --- Certainty=0. 2136 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < auco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 55/240 (22%) , Positives = 92/240 (37%) , Gaps = 20/240 (8%) 



Query: 




INELTIDGVKTSSFKCDVLVETRPNVIVSSS--KTALLEHDGISGAWQSNRHRC3LIEKP 


61 






I ++ ID TSS VL I+S S + +G S + N + I 




Sbjct: 


2 


I PKVI IDDFDTSS I PNCVLTGYDVGD I LS P S FVENEAYGMNGTSRELES YNESKPTIM- - 


5S 




62 


YHITLIEPSDEEIYRFSALLNREKFW-LENEQEPTIRLWCYKVDSFEIGKDEFGAWWDV 


120 






+H++ + +I L++FW + N ++YS+I +WV + 




Sbjct: 


60 


WHLSTFDDAVNLINHLDGLSKKIEFWHIPNS IYYYDCLSVKINAVTMSSWRVTL 


113 




121 


TFICHPTKFFKTTDIQTLTGNGVLRVQGSALAFPKIIWGQSASETSFTIGNQVIKLEKL 


160 






+P ++ K + GNG + G+ + PKI V G + + TIG QV++L L 




Sbjct: 


114 


KLALYPFRYAKGVSDWIAGNGNINNAGNVFSEPKIWKG--TGKGTLTIGKQVMKL-NL 






181 


SESLVMTNDPDNPSFKTASGKL- - - 1 KWAGDF I TVDTAKGQNVG VVLGAGITSLKFETVW 237 






S 4- AG+I+GF+ G++ GIT W 




Sb j ct : 


171 


SGKATIECKHGQQCVYDAEGNVKNSIRIRGSFFEIQPG TQGIAVSGGITRTIISPRW 


227 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 



Example 826 

A DNA sequence (GBSx0876) was identified in S.agalactiae <SEQ ID 2499> which encodes the amino 
acid sequence <SEQ ID 2500>. This protein is predicted to be PblB. Analysis of this protein sequence 
reveals the following: 

30 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 952 - 968 ( 952 - 968) 



Final Results 

35 bacterial membrane — Certainty=0 . 1001 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAG18640 GB:AY007505 PblB [Streptococcus mitis] 
Identities = 145/542 (26%) , Positives = 255/542 (46%) , Gaps = 52/542 (9%) 

MLFLLDANWTVKWNGIPLHEASSAIVKEETNGDFYLTVRYPITDSGIYQLIKEDMLIKS 60 
M++L + N PL+ A + + +E N + LT R+P +D +++ +KE+ +K+ 

MIYLTNGNT PLNAAYADKI SQEANSTYQLTFRFPTSDV- LWEKLKEETFLKA 5 1 







Sbjct: 




Query: 


61 


Sbjct: 


52 




121 


Sbjct: 


110 


Query: 


181 


Sbjct: 


169 


Query: 


240 



FSF SDI + TFNT + + D KHSI+G W G+LVR + + + ++ G+ 

OTFSFFSDIDERHTFNTDSVNAMVRinTO-KHSILGQWGGDLVRHGYQVRLLKNGGS 168 

A7ITTHKNLKS YQRTKNSQGVVTRI HARS TFKPDGAE - DEVTLRVS VDS PLINSYPY 239 
+ KNL SYQ +++ + TRI ++T K +G + + V VDSPL+N Y 
3LFMYKKNLSSYQHKTSTKSLKTRITFKAT\ r KGEGEKAPDRKFSVVVDSPLvNKYSQ 228 

Query: 240 INEKEYEfflMAETVED--LRKWAEAKFTNEGIDKVSrX^^ 297 
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Query: 298 SRKHSADLYKKAIAYEEmLTEEYISITFDDKPGVGGSGVSSGLSN-VADAILVASATAQ 356 

+ D+ KK Y ++ + ++ +SI F G SG+S+ LSN V+DA+ + Q 
Sbjct: 289 HDRPKMDVRKKITKYTYSPMAKKLLSIGF GQFKSGLSNMLSNAVSDAVKNETQHLQ 344 

Query: 357 D---VAVQRAVKNAMAAFDAEFGKTKTKI^DIEIAKAKVESFKSELSKRMDNQLLP--- 410 

4- + +KNA+ AFD + + + D + AKAK E K L+ +D + 
Sbjct: 345 GQFATQLGKEIKNADIAFDRKKEELWQFTDGLNAAJCP.KAEEVKKSLTETIDQRFRDFDS 404 

Query: 411 IATEAKNLASQAQADLTRKEIELRAELNRQVTSTEAVK 448 

LA EAK ++ QA+ + K E + ++ + TS + 
Sbjct: 405 TGLKEIKQKAEEALQRVGANTLLAQEAKQISEQARCCMDSKFAEYKQSVDGRFTSLSSQL 464 

Query: 449 ISLTNLSHNMDIIKQKALNDLRIl&ETRLKFADSVQQIAT^ 508 

+D + + ++L + E+D +++A + ++L + S +VGG 

Sbjct: 465 AGKANL IDFQRVQEKSNLYERI IGS SESD IAEKVARMTLTMQLFQVEVGKYS - AVGG 520 

Query: 509 YN 510 
N 

Sbjct: 521 PN 522 

Identities = 47/183 (25%) , Positives = 83/183 (44%) , Gaps = 22/183 (12%) 

Query: 867 VTTLRVTKGTIPADWSPSPDDLKAYSDTKLEQIANEIKASVTSLDHKTLKQTDITMTSEG 926 

+T L +GT W P+P+D +D LE T QT +T+ 

Sbjct: 667 MTELDFYEGTTDRRWQPAPEDATLETDKTLEAT QTKLTLLQGS 709 

Query: 927 IV1RAGKTSNDVARAIGSYFKUTPDAIALFSSLIKVSGNMLVDGSVTSRKLVTGAVETGH 986 

++ TS A +1 S T + I + + I++ G L+D +T+ + G 

Sbjct: 710 FAIQ-NLTS AGSIVSQINATMNQILIEAEKIRLKGKTLLD-ELTAIDGYFKRLFVGE 764 

Query: 987 WAGAITGVLIAAEAVTAEKLKVDQAFFNKIMANDAYLKQLFAKSAFITQVQSVTISASQ 1046 

+ ++ ++ +TA+KL +DQA +++D + L AK AFI +++SV +SA+ 

Sbjct: 765 GTFAKLNAEIIGSKTITADJOjINDQMffiRLFVSSDIFTDTLAAKEAFINKLRSVWSATL 824 

Query: 1047 ISG 1049 
G 

Sbjct: 825 FEG 827 

A related DNA sequence was identified in S.pyogenes <SEQ ID 250 1> which encodes the amino acid 
sequence <SEQ ID 2502>. Analysis of this protein sequence reveals the following: 

: N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2445 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/552 (25%) , Positives = 251/552 (44%) , Gaps = 43/552 (7%) 

Query: 11 TvTCWNGIPLHFASSAIVKEETIKDFYLTVRYPITDSGIYQLIKEDMLIKSPVPVLGAQLF 70 

++K + PL A + +E N D+ L +YP LIK+ +++++ + G+QLF 

Sbjct: 3 S I KDDNTPLV7AAFEDE ITQEANSDYKLNFKYPAKHE - YRPLI KKGI I LEAD - DLHGSQLF 60 

Query: 71 RIKKPIENDDSMDITAYHVSDDlMKRSITPVSWGO^CfiMALSQMVQNAKTGLGDFSFTS 130 

RI + + +++ A V+DD+ +1 +SV +S++ + K FSF S 

Sbjct: 61 RIFEITKKHGYINVYANQVADDLNGYAIDTISOTRVOGMTVMSELaGSIKRE-HPFSFFS 119 

Query: 131 DIMDSRTFNTTETETLYSVU©GKHSIVGTWEGELVRDNFALSIKRSRGADRGWITTHK 190 

DI TFN ++ + L +GKHSI+G W GELVR+ + +++ + G D + K 
Sbjct: 120 DIDGRHTFNQSDVSVM-DALANGKHSIMGQWGGELVRNKYQINLLKKAGKDTETLFMYKK 178 
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Query: 191 NLKSYQRTKNSQGWTRIH ARSTFKPDG AEDEVTLRVSVDSPLI 234 

NLKSY+ T +G+V+ +H + DG + + T+RVSV+S L 

Sbjct: 179 NLKSYEETDTIKGLVSILHLVAEVEEEHEVETR3ASDGNIGHSESPKKKTIRVSVESKLK 238 

5 Query: 235 NSYPYINEK--EYE1^AE1VEDLRI{WAEAKFTOEGIDI<VSDAIEIEAYELDGQVVNLGD 292 

+++P I EK + ++ + +T EDL + + F D ++++I+ V L D 

Sbjct: 239 DTHPIITOKTIKVQDQDVKTEEDLLAYGKKYFEK^CDIPGNSLKIDVTNNYEGAVRLFD 298 

Query: 293 TVNLKSRKHSADLYKKAIAYEFNALTEEYISITFDDKPGVGGSGVSSGLSNVADAILVAS 352 
10 T + + DL + Y F + SI F G + ++ 4SN D + S 

Sbjct: 299 TAIVFHELYDRDLRMQITGYRFAPMANRLKSIIF GEIKTNLAKQI SNQIDNKVAES 354 

Query: 353 ATAQDVA VQRAVKNANAAFDAEFGKTKTKINDDIEIAKAKVESFKSELSNR-MDNQ 407 

D A 4Q+ +■ NAN FD + K 4- +1 D 1 + A+A E +E++ 4 44 4 
15 Sbjct: 355 TAQHDAAFEAKLQKQIDNANRIFDTKEAKLRESIEDGIKKAFANAEVKVAEVNAKVLERE 414 

Query: 408 LLPLATEAK NLASQAQADLTRKEIELRAEUNRQVTSTEAVKISLTNLSHNMDIIK 462 

L A 4 4 +A +D+KERL 4 D 4 

Sbjct: 415 ELAKAVDERLKKFLSDADTKEQDFDKKLEEFRTSLKDLEVDEKQIDDALAKAGFSKDSLA 474 

20 

Query: 463 QKALNDLRDAETRLKEADSVQQL-ATKRVEDKLTGLSTKLESFSVGGYNYVIDGGEPKEL 521 

+ET A4 V T 44L G + K+ +F GY + GE E 

Sbjct: 475 DIKAKLEDTSETATVTANIVGSTGGTFYNRNRLDGDTDKVITFE-QGYIDIAHNGEGFE- 532 

25 Query-. 522 MANFYGKTYDIN 533 
GKTY 1+ 
Sbjct: 533 EGKTYTIS 540 



A related GBS gene <SEQ ID 8665> and protein <SEQ ID 8666> were also identified. Analysis of this 

30 protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 11 

Peak Value of OR: 1.54 
35 Net Charge of CR: 1 

McG: Discrim Score: -3.43 
GvH: Signal Score (-7.5): -5.44 

Possible site: 58 
>>> Seems to have no N-terminal signal sequence 
40 Amino Acid Composition: calculated from 1 

ALOM program count: 1 value: -0.00 threshold: 0.0 

INTEGRAL Likelihood = -0.00 Transmembrane 897 - 913 ( 897 - 913) 
PERIPHERAL Likelihood = 1.48 932 
modified ALOM score: 0.50 
45 icml HYPID: 7 CFP: 0.100 



*** Reasoning Step: 3 



Final Results 

50 bacterial membrane Certainty=0. 1001 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

55 32.8/53.9% over 503aa 

EGAD|33685| hypothetical protein Insert characterized 

EGAD | 71773 | 76294 hypothetical protein { } Insert characterized 

SP|P15317|YHYA_BPH44 HYPOTHETICAL 65 KDA PROTEIN IN HYALURONIDASE REGION. Insert 
characterized 

60 GP|215054|gb|AAA98102.l| |M19348 ORF {Streptococcus pyogenes phage H4489A} Insert 

characterized 

PIR|B30566 jB30566 hypothetical protein - phage H4489A Insert characterized 



ORF00870 (1957 - 3777 of 4272) 
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EGAD] 33685 | 35003 (37 - 540 of 593) hypothetical protein {Streptococcus pyogenes) 
EGAD | 71773 j 76294 hypothetical protein { } SP | P15317 | YHYA_BPH44 HYPOTHETICAL 65 KDA PROTEIN 
IN HYALURONIDASE REGION. GP | 215054 | gb|AAA98102 . 1 | | M19348 ORF {Streptococcus pyogenes phage 
H4489A} PIR|B30566 |B30566 hypothetical protein - Streptococcus pyogenes phage H4489A 
%Match =4.4 

% Identity =32.8 %Similarity =53.8 

Matches = 137 Mismatches = 175 Conservative Sub.s = 88 

1749 1779 1809 1839 1869 1899 1929 1959 

TRLKEADSVQQLATKRVEDKLTGLSTKLESFSVGGYNYVIDGGEPKEU1ANFYGKTYDINPQLLERTSQATLSFSYEAES 

: | :| | 
MSRDPTYTINEHDLSFADGRFYVTFKADKSSETVRLN 



:| | -. || S! | : | | | : | :| : ||:: | : : | | 

SSCLGNTIIKKLQVEDDNTMHDFVKPKVTTQQAFGI^ 



|: | :|: =::| | := ||:| 
SAEQILLQVKSIDDERYSKFEQTLNGIKQTVKSESVESARTQLASMFDSRISGLDGKYSRLSQTIDSLSSRLDDGVGNYS 
130 140 150 160 170 180 190 

2271 2301 2331 2361 2388 2418 2448 

NLANLRTSTETLAGQLTSAESSIRQTSESFSNRLVSLETY-KDSEPNRASRYFEASKSETAK 

::: | : : | |:|:| | |:| : |:| ,|s, | || 

TLSQKVSGIDLRVSNAANDVSRLSQTAOGLQSQITNA NQNYSSLSQTVQGLQTTVRDNQSNATSRI 

210 220 230 240 250 260 

2478 2838 2868 2898 2928 2958 3009 

QLSALRTEVN SFVANNANFRANSLKIRFTDSQLKFRVTTLRVTKGTIPADWSPSPDDLK-AYSDT- - KLEQTANEI 

=11 = = l :|||| = = I I = I I = 1 =11 
- NQLSDLIST-KVTKGDVETTIAQSYDKIAFAIRDKLPASKMTGSEI 



3039 3069 3099 3129 3159 3189 3213 3243 

KASVTSLDHKTLKQTDITMTSEGIVLRAGKTSNDVARAIGSYFKVTPDAIALFSSLIKVSG-NMLVDG-SVTSRKLVTGA 

ill I >|::|.|: >ll I I » I 

IS AINLDRSGVKI TGKNI TLDGNSYI SNAVI KDA 

320 330 340 

3261 3291 3321 3351 3381 3411 3441 3471 

VETGHVKAGAI TGVLIAAFAVTAEKLKVDQAFFNKLMANDAYLKQLFAKSAFI TQVQSVTISASQI SGGVI KALNN 

■•:]■■ I = -1111 = 1 :|:hl lllill Ih l = = 1111= 1 I 11 = 11 111 = 1 = 111= I I 
HIANMDAGKINTGYU^SRIAAEAITGDKIKMDYAFFNKLT^ 

360 370 380 390 400 410 420 

3501 3537 3567 3624 3648 3678 

AMEIQMNSGQILYYTD QAALKRVLSGYPTQF^KFATGTVSG-KGNAGVTVIG- -SNRYGTESTNDGGFVGVR 

I =11 I = I II I II Ml I: I I =1 = 1 II 1= I =1 = I I III 



: : :| =: ||:| : | |: |, :| : = : | 

FFRYAEGLQHTAKVDQAEIYGDDI-WSDDFNIDRGFKMRPSLMPKMVDLNKMYQAIL^ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 9059> which encodes amino acid sequence 
<SEQ ID 9060>. An alignment of the GAS and GBS sequences follows: 
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Query: 


370 


Sb D ct: 


897 


Query: 


411 


Sbjct: 


956 




470 


Sbjct: 


1010 


Query: 


530 


Sbjct: 


1067 




588 


Sbjct: 


= 31 



SGQI +YT++4A++R+ S +QF+K +G +S GN A +TVIGSN G+E+ 



Identities = 34/151 (22%) , Positives = 62/151 (40%) , Gaps = 13/151 (8%) 



Sbjct: 

Sbjct 
Query: 
Sbj 



160 QNADKKLSASYQLGIDGLKATMRSDKIGLQAEIQTTAQGLYQRYDNEIRKLSAKITTTSS 219 

Q A K +A++ K + D +A++++ L R DN++ L+ + +S 

306 QRAVKNM^FDAEFGKTKTKINDDIEIAKAKVESFKSELSNR^NQLLPIATEAKNLAS 365 

220 GTTEAYESKLDGLRAEFTH SNQGMRVELES KISGLQSTQQATARQI SQE 268 

K LRAE S + +++L+ K L +AR + + 

366 QAQADLTRKE IELRAEIiNRQ VTSTEAVKI SLTNLSHNMDI IKQKALNDLRDAETR- LKEA 424 

269 I SNREGAVSRVQQGLDSYQRRLQS -AEGNYN 298 

S ++ A RV+ L +L+S + G YN 

425 DSVQQLATKRVEDKLTGLSTKLESFSVGGYN 455 



SEQ ID 8666 (GBS202) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 50 (lane 5; MW 132kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 827 

A DNA sequence (GBSx0877) was identified in S.agalactiae <SEQ ID 2503> which encodes the amino 
acid sequence <SEQ ID 2504>. This protein is predicted to be nuclear/initotic apparatus protein. Analysis of 
this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2847 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 828 

A DNA sequence (GBSx0879) was identified in S.agalactiae <SEQ ID 2505> which encodes the amino 
acid sequence <SEQ ID 2506>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3420 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 829 

A DNA sequence (GBSx0880) was identified in S.agalactiae <SEQ ID 2507> which encodes the amino 
acid sequence <SEQ ID 2508>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have an uncleavable N-term' signal seq 

Likelihood = -7.54 Transmembrane 10- 26 ( 2- 28) 



- Certainty=0. 4015 (Affirmative) > 



bacterial outside --- Certainty=0 . 0000 (Not Clear) < 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MPPWLIDSTVWATOTVLGGLFSTIITTSANRKDQLIKHQYEDIKEDLSGLIDKVKTIDH 60 

MP WL D+ V+ ++T G+ + ++ K K EDI LS L +V ID 

Sbjct: 1 MPiWI^TAVLTTIITACSG^TVLUtfK^^ 60 

Query: 61 TTTETKKISKITKDGTIjKIQRYRLFHDIjTKEISQGyTTIEHFRELSIIjFESyQLLGGNGE 120 

TT +++ +DGT KIQRYRL+HDL +E+ GYTT++HFRELSILFESY+ LGGNGE 

Sbjct: 61 TTVAINHQNDVIQDGTRKIQRYRLYHDLKREVITGYTTLDHFRELSILFESYKNLGGNGE 120 

Query: 121 1EALFEKFKQLPIEED 136 

+EAL+EK+K+LPI E+ 
Sbjct: 121 VEALYEKYKKLPIREE 136 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 2508 (GBS11 8) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 32 (lane 5; MW 42kDa). 

GBSU8-GST was purified as shown in Figure 198, lane 8. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 830 

A DNA sequence (GBSx0882) was identified in S.agalactiae <SEQ ID 2509> which encodes the amino 
acid sequence <SEQ ID 2510>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
5 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0.3000 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8667> and protein <SEQ ID 8668> were also identified. Analysis of this 
15 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 6.5B 
GvH: Signal Score (-7.5): -0.49 
Possible site: 53 
20 >>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 12.15 threshold: 0.0 
PERIPHERAL Likelihood = 12.15 84 
modified ALOM score: -2.93 

25 *** Reasoning Step: 3 

; Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) 

SEQ ID 2510 (GBS56) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 8; MW 9.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 21 (lane 10; MW 34.9kDa). 

35 GBS56-GST was purified as shown in Figure 195, lane 7. 
Example 831 

A DNA sequence (GBSx0883) was identified in S.agalactiae <SEQ ID 251 1> which encodes the amino 
acid sequence <SEQ ID 2512>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
40 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 832 

A DNA sequence (GBSx0884) was identified in S.agalactiae <SEQ ID 2513> which encodes the amino 
acid sequence <SEQ ID 2514>. This protein is predicted to be N-acetylmuramoyl-L-alanine amidase. 
Analysis of this protein sequence reveals the following: 

f-terminal signal 



10 Final Results 

bacterial cytoplasm Certainty=0 . 0342 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB07986 GB:Z93946 N-acetylmuramoyl-L-alanine amidase 
[bacteriophage Dp-1] 
Identities = 96/141 (68%), Positives = 118/141 (83%) 

20 Query: 1 

Sbjct: 1 

Query: 61 DWLIKKK3YELIAENVDWATOGDIAIWGMRGHSSGAGGHVVMFIDPENIIHCNWANNGIT 120 

25 WLI+NGYELI+EN W+A RGDI IWG +G S+GAGGH MFID +NIIHCN+A +GI + 

Sbjct: 61 AWIilENGYELISENAPWDAKRGDIFIWGRKGASAGAGGHTGMFIDSDNIIHCNYAYDGIS 120 

Query: 121 VNNYNQTAAASGWMYCYVYRL 141 
VN++++ +G Y YVYRIi 
30 Sbjct: 121 VNDHDERWYYAGQPYYYVYRL 141 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8669> and protein <SEQ ID 8670> were also identified. Analysis of thi 
protein sequence reveals the following: 

35 RGD motif 81-83 

The protein has homology with the following sequences in the databases: 

58.2/72.9% over 182aa 

GP | 1934766 | N-acetylmuramoyl-L-alanine amidase {bacteriophage Dp-l} Insert characterized 

40 

ORF00875(301 - 1044 of 2004) 

GP|l934766]emb]CAB07986.l| ]Z93946(1 - 183 of 296) N-acetylmuramoyl-L-alanine amidase 
{bacteriophage Dp-l} 
%Match =15.5 
45 %Identity =58.2 %Similarity =72.8 

Matches = 107 Mismatches = 49 Conservative Sub.s = 27 

234 264 294 324 354 384 414 444 

LQKYinHMSDDDLTLFVESAVTCQMHDAWKE*PMEIOT^ 
50 I :: I :||| I I : I : I | I | I I : I | I : I I I I I I I : I I I I I I I I I I 

MGTOIEKGVATOQARKGRVSYSMDFRDGPDSYDCSSSMYYADRSAGAS 



SAGWAVNTEYMHDWLIKNGYELIAEIWDWNAWGDIAIKG^GHSS 

iiinimm iii:iiiiii = n i = i mi in = i 1=11111 1111 =111111 = 1 =11 = 11== = = 

SAGWAVNTEYMHAV&IENGYELISENAPWDAKRGDIFIKGRKGASAGAGGHTGMFIDSD^ 
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60 70 80 90 100 110 120 

714 744 774 804 834 864 894 924 

AASGWMYCYVYRLKSGASTQGKSLDTLVKETIAG^^ 
5 :| I HIM : 

YYAGQPYYYVYRLTNA- - - - 

140 

954 984 1014 1044 1074 1104 1134 1164 

1 0 GNGEaRKKSLGSQYDAVQKRVTELLKKQPSEPFKAQEVNKPTETKTSQTELTGQflTATKEEGDLSFNGTILKKAVLDKIL 

I ■■ «l IN II I > I' I II I ■■ ■■ 11 = 

~NAQPAEKKLGWQKDATCFWYARANGTYPKDEFEY~EENKSWFYFDDQGYM 

160 170 180 190 200 210 220 

SEQ ID 8670 (GBS302) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
15 extract is shown in Figure 50 (lane 6; MW 55kDa). 

The GBS302-His fusion product was purified (Figure 205, lane 6) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 302), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 833 

A DNA sequence (GBSx0885) was identified in S.agalactiae <SEQ ID 2515> which encodes the amino 
acid sequence <SEQ ID 2516>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1509 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 834 

A DNA sequence (GBSx0886) was identified in S.agalactiae <SEQ ID 2517> which encodes the amino 
acid sequence <SEQ ID 2518>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1264 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13473 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 25/68 (36%) , Positives = 41/68 (59%) 

50 
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Query: 4 IENLIIAIWPLISQPDQLTIKIQDGPEFLEYHLDLDTQDIGRVIGKKGRTITAIRSIVY 63 

+E+LI+ IV PL+ PD + + ++ + + L + D G+VIGK+GRT AIR+ V+ 
Sbjct: 6 LEDLIVHIVTPLVDHPDDIRVIREETDQKIALRLSVHKSDTGKVIGKQGRTAKAIRTAVF 65 

Query: 64 SVPTQGKK 71 

+ Q K 
Sbjct: 66 AAGVQSSK 73 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2519> which encodes the amino acid 
sequence <SEQ ID 2520>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 1012 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 72/79 (91%), Positives = 75/79 (94%) 

Query: 1 MDTIENLIIAIVKPLI3QPDQLTIKIQDGPEFLEYHLDLDTQDIGRVIGKKGRTITAIRS 60 
MDTIENLI IAIVKPLISQPD LTIKI+D P+FLEYHLDLD QDIGRVIGKKGRTITAIRS 

Query: 61 IVYSVPTQGKKVRLIIDEK 79 

IVYSVPT GKKVRL+IDEK 
Sbjct: 61 IVYSVPTLGKKVRLVIDEK 79 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 835 

A DNA sequence (GBSx0887) was identified in S.agalactiae <SEQ ID 2521> which encodes the amino 
acid sequence <SEQ ID 2522>. This protein is predicted to be ribosomal protein SI 6 (rpsP). Analysis of 
this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3654 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06202 GB:AP001515 ribosomal protein S16 (BS17) [Bacillus halodurans] 
Identities = 62/90 (68%) , Positives = 73/90 (80%) 

Query: 1 mWIRLTRMGSKKKPFYRINVADSRAPPDGRFIETVGTYNPLVAENQVTIKEERVLEWL 50 

MAVKIRL RMGSKK PFYR+ VADSR+PRDGRFIE +GTYNPL +V +KE+R L+W+ 
Sbjct: 1 mVKIRLKIMGSKJffiPFyRVWADSRSPRDGRFIEEIGTYNPLTQPAiaTELKEDRALDWM 60 

Query: 61 SKGAQPSDTVRNLLSKAGVMTKFHDQKFSK 90 

KGA+PSDTVRNL SKAG+M K H+ K K 
Sbjct: 61 LKGAKPSDTVRNLFSKAGLMEKLHNAKNEK 90 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2523> which encodes the amino acid 
sequence <SEQ ID 2524>. Analysis of this protein sequence reveals the following: 
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3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3654 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/90 (95%) , Positives = 89/90 (98%) 

Query: 1 ^WIRLTRMGSKKKPFYRINVADSRAPMGRFIETVGTYNPLVAENQVTIKEERVIEWL 60 

MAVKIRLTRMGSKKKPFYRINVADSRAPRDGRFIETVGTYNPLVAENQ+TIKE+RVLEWL 
Sbjct: 1 MAVKIRLTRMGSKKKPFYRINVADSRAPRDGRFIETVGTYNPLVAENQITIKEDRVLEWL 60 

Query: 61 SKGAQPSDTVRNLLSKAGVMTKFHDQKFSK 90 

SKGAQPSDTVRN+LSKAGVM KFHDQKFSK 
Sbjct: 61 SKGAQPSDTVRNILSKAGVMAKFHDQKFSK 90 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 836 

A DNA sequence (GBSx0888) was identified in S.agalactiae <SEQ ID 2525> which encodes the amino 
acid sequence <SEQ ID 2526>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.09 Transmembrane 22 - 38 ( 16 - 42) 
INTEGRAL Likelihood = -7.64 Transmembrane 382 - 398 ( 375 - 402) 
INTEGRAL Likelihood = -7.59 Transmembrane 291 - 307 ( 284 - 317) 
INTEGRAL Likelihood = -4.94 Transmembrane 340 - 356 ( 335 - 366) 

Final Results 

bacterial membrane Certainty=0. 5437 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24912 GB:AF012285 YknZ [Bacillus subtilis] 
Identities = 161/417 (38%) , Positives = 241/417 (57%) , Gaps = 25/417 (5%) 

MENWKFALSS ILGHKMRAFLTMLGI I IGVASVVLIMAI^KGMKDSVTNEITKSQKNLQIY 6 0 
+EN + ALSS+L HKMR+ LTMLGI1IGV SV++++A+G+G + + 1+ +++Y 
LENIRMALSSVIAHmRSILTMLGIIIGVGSVIVWAVGQGGEQi^KQSISGPGNTVELY 63 





1 


Sbjct: 


4 




61 


Sbjct: 


64 




120 


Sbjct: 


110 




180 


Sbjct: 


169 




240 


Sbj Ct : 


224 


Query: 


300 



KI+SGR F + D+ +RV ++ +K+A+ LF 



IMT +IG+IA IS 



Query: 300 LLVGGIGVMNIMLVSVTERTREIGLRKALGATRRKILAQFLIESMVLTILGGLIGLLLAY 359 
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LLVGGIGVMNIMLVSVTERTREIG+RK+LGATR +IL QFLIES+VLT++GGL+G+ + Y 
Sbjct: 283 LLVGGIGVMNIMLVSVTERTREIGIRKSLGATRGQILTQFLIESWBTLIGGLVGIGIGY 342 

Query; 360 GGTMLIANAQDKITPS - VSLNVAIGSLIFSAFIGI I FGLLPANKASKIiNPIDALRYE 415 

GG L++ PS 4S V G ++FS IG+IFG+LPANKA+KL+PI+ALRYE 

Sbjct: 343 GGAALVSAIAG- -WPSLISWQWCGGVTjFSMLIGVIFGMLPANKAAIQjDPIEALRYE 397 

There is also homology to SEQ ID 1350. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 837 

A DNA sequence (GBSx0889) was identified in S.agalactiae <SEQ ID 2527> which encodes the amino 
acid sequence <SEQ ID 2528>. This protein is predicted to be ABC transporter (ATP-bindingprot). 
Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 4080 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06841 GB:AP001517 ABC transporter (ATP-binding protein) 
[Bacillus halodui-ans] 
Identities = 131/218 (60%) , Positives = 169/218 (77%) 

Query: 8 LIRLHQIVKSYQNGDQKLQVLKNIDLTVYEGEFLAIMGPSGSGKSTLMNIIGLLDSPTSG 67 

+ I+L ++ KS++ G + +++L IDL + G+FLAIMGPSGSGKSTLMNIIG LD PTSG 
Sbjct: 1 MIKLERVTKSFRVGTEMVEILSAIDLEIASGDFLA1MGPSGSGKSTLMNIIGCLDQPTSG 60 

Query: 68 DYSI^GKRvEELSQTIO^QvRNKEIGFVFQQFFLLSKiTALQNVELPLIYAGVPPKKRKN 127 

Y +GK + S+ ++A++RN+ IGFVFQQF LL +LTALQNVELP++YAG+ K+R 
Sbjct: 61 RYMFDGKDLTNYSEQEIAKIRNRHIGFVFQQFHLLPRLTALQNVELPIWYAGMKKKERTE 120 

Query: 128 LAKQFLDKVELRERIflSIHLPTELSGGQKQRVAIARALVNSPSIILADEPTGALDTKTGEQI 187 

A L++V L ERM +LP LSGGQKQRVAIAR++VN P+IILADEPTGALDTKT E I 
Sbjct: 121 RAAHALERVGIiAERMTYLPNSLSGGQKQRVAIARSIVNEPNIILADEPTGALDTKTSETI 180 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2529> which encodes the amino acid 
sequence <SEQ ID 2530>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0. 1739 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 182/232 (78%) , Positives = 207/232 (88%) 
Query: 5 RKELIRLHQIVKSYQNGDQlQ^QVLJCNIDLTATEGEFIiAIMGPSGSGKSTIjMNIIGLIiDSP 64 



WO 02/34771 



PCT/GB01/04789 



Query: 65 TSGDYSMGKRVEELSQTKLRQVRNKEIGFVPQQFFLLSKLTALQNVELPLIYAGVPPKK 124 

TSGDY+L+ ++E L+ +LA+VRN EIGFVFQQFFLL+KLTALQNVELPIliIYAGV K 
Sbjct: 65 TSGDYTLHNTKIEILNDREIAKVRTO^ 124 

Query: 125 RKNIAKQFLDKVELREJMCraLPTELSGGQKQRVAIARALVNSPSIILADEPTGALDTKTG 184 

R+ AKQFL+KV L R+ HLP+ELSGGQKQRVAIARALVN PSIIIADEPTGALDTKTG 
Sbjct: 125 RREQAKQFLEKVGLGRRIKHLPSELSGGQKQRVAIARALVNDPSIILADEPTGALDTKTG 184 

Query: 185 EQIMQFLTELNQEGKTIIMVTHEPEIADYATRKIVIRDGEITADTTDSIRID 236 

+QIM+ LTELN+EGKTIIMVTHEPEIAD+ATRKI + IRDG+IT DTT S+ ID 
Sbjct: 185 QQIMELLTEJjNKEGKTIIMOTHEPEIADFATRKIIIRDGDITTDTTASWID 236 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 838 

A DNA sequence (GBSx0890) was identified in S.agalactiae <SEQ ID 253 1> which encodes the amino 
acid sequence <SEQ ID 2532>. This protein is predicted to be ATP-binding cassette transporter-like 
protein. Analysis of this protein sequence reveals the following: 



17 - 33 ( 13 - 

- Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative) . 

bacterial outside — Certainty=O.OO0O (Not Clear) < s 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9965> which encodes amino acid sequence <SEQ ID 9966> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24909 GB:AF012285 YknX [Bacillus subtilis] 
Identities = 104/391 (26%) , Positives = 182/391 (45%) , Gaps = 21/391 (5%) 

KKGAIISGLSVALIWIGGFLWQSQPNKSAVKTNYKVFNVREGSVSSSTLLTGI<AKANQ 72 
KK I G++V + + +G ++ + P + + +V E +SS+ ++ G K + 

KKVWIGIGIAVIV7ALFVGINIYRSAAPTSGSAGKEVQTGSVEENEISSTVMVPGTLKFSN 61 

EQYVYFDAlWG^IRATm^WGDKITAGQQLVQYDTTTAQAAYDTANRQLNKVARQINNl ) K 132 
EQYV+++A+KG + VK GDK+ G LV Y T Q + + QL + ++ + 





13 


Sbjct: 


2 




73 


Sbjct: 


62 




133 


Sbjct: 


120 




193 


Sbjct: 


174 


Query: 


250 


Sbjct: 


228 


Query: 


310 


Sbjct: 


287 



--ELKQTELQRQ 173 



I- S++ GTV+ VN 



YD VKK Q V + S V K W+G +S -t 
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Query: 369 DAKTQEILSGLKAGQIWTNPSKTFKDGQKI 399 

EI GL V+ NPS DG ++ 

Sbjct: 345 TDDLTEIKEGLTQDDQVILNPSDQVTDGMEV 375 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2533> which encodes the amino acid 
sequence <SEQ ID 2534>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.61 Transmembrane 15 - 31 ( 11 - 36) 

Final Results 

bacterial membrane Certainty=0 .4843 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 


16 


Sbjct: 


6 




76 


Sbjct: 


64 




136 


Sb j ct : 


119 




193 


Sbjct: 


179 


Query: 


252 


Sbjct: 


239 




312 


Sbjct: 


292 




371 


Sbjct: 


350 



L V G L+EYD VK GQ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/421 (55%) , Positives = 301/421 (70%) , Gaps = 19/421 (4%) 

Query: 3 MSKRQNLGISIOCGAIISGLSVALIWIGGF-LWVQSQPNKSA--VKTNyKVENVREGSVS 59 

MSKR + 1+ K +1+ + L+++I G LW Q + +A K Y +V EGS++ 
Sbjct: 1 MSKRGKIKITTKTKL I TASVI TLVLI I TGI VLW KQQRNTLTADIAKEPYSTVS VTEGS IA 60 

Query: 6 0 SSTLLTGKAKAWQEQYVYFDANKGNI^TVTvKVGDKITAGQQLVQYDTTTAQARYDTANR 119 

SSTLL+G KA E+Y+YFDANKGN ATVTVKVGD++T GQQLVQY+TTTAQ+AYDTA R 
Sbjct: 61 SSTLLSGTVKALSEEYIYFDANKGNDATVTVKVGDQOT 120 

Query: 120 QIiNKVARQINNLKTTGSLPAOTSSDQSSSSSQGC^TQSTSGATNRLQQNYQSQANASYNQ 179 

LNK+ RQIN+LKT G +PA+ S++ + + G+ T +T + +Q NA+Y Q 

Sbjct: 121 SLNKIGRQINHLKTYG-VPAV- STETNRDEATGEETTTTVQPS- - AQQNANYKQ 170 

Query: 180 QLQDUTOAYADAQAEVNKAQKALNDTVITSDVSGTVVEWSDIDPASKTSQvLVHVATEG 239 
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QLQDLNDAYADAQAEVNKAQ ALNDTV+ S VSGTWEVN+DIDP+SK SQ LVHVATEG 
Sbjct: 171 QLQDLSTOAYADAQAEVNKAQIAIJro^ 230 

Query: 240 K1QVQGTMSEYDLANVKKDQAVKIKSKVYPDKRWEGKISYISKWP-EAEM1N NDS 293 

5 +LQV+GT++EYDLANVK Q+VKIKSKVY ++EW GKISY+SNYP E+ A + + 

Sbjct: 231 QLQVKGTLTEYDLANVKVGQSVKIKSKVYSNQEWTGKISYVSNYPTESNAGSTTPAGSTG 290 

Query: 294 mGSSAVmKIim)ITSPLVJ^KQGFT^SVEVVNGI)mhIV9TSSVJN}n)NmFVmYm 353 
S+ Y YK+DI SPL+ LKQGFTVSVEVVN K +VP ++VI KD KH+VW Y+D 
10 Sbjct: 291 AGSSTGATYDYKIDIISPIiNQLKQGFTVSVEVVNEAKQALVPI.TAVIKKDKKHYVVJTYDD 350 

Query: 354 StTOKISKVEVKIGKADAKTQEILSGLKAGQIWrorPSKTFBCDGQKIDNIESIDLNSNKKSE 414 

+ K KVEV +G ADA+ QEI G+ G IV+ NP K K +K++ + SI N+ + + 
Sbjct: 351 ATGKAKKVEVTLGNADAQQQEIHKGVAVGDIV1ANPDKNIKPDKKLEGVISIGTNTKPEKD 411 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 839 

A DNA sequence (GBSx0891) was identified in S.agalactiae <SEQ ID 2535> which encodes the amino 
20 acid sequence <SEQ ID 2536>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 1832 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 840 

A DNA sequence (GBSx0892) was identified in S.agalactiae <SEQ ID 2537> which encodes the amino 
35 acid sequence <SEQ ID 2538>. This protein is predicted to be carbamoyl-phosphate synthase, pyrimidine- 
specific, large chain, putati. Analysis of this protein sequence reveals the following: 

Possible site-. 59 

>» Seems to have an uncleavable N- 
Likelihood = -1.70 



Final Results 

bacterial membrane --- Certainty=0. 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91005 GB:Z54240 carbamoyl-phosphate synthase [Lactobacillus 

Identities = 117/417 (28%) , Positives = 205/417 (49%) , Gaps = 37/417 (8%) 

Query: 122 FVQVDCLVMRDSLNNCLYVSDLEYIES-NKTTGKSLAIVPSQTLSDAARQTIRDVAFDVC 130 

+ +++ VMRD+ +N + V ++E + TG S+ P QTL+D Q +RD A + 

Sbjct: 213 YKEIEFEVMRDAADNAMWCNMENFDPVGIHrGDSIVYAPVQTLADREVQLLRDAALKII 272 
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Sbjct: 


273 


5 


Query: 


241 




Sbjct: 


333 




Query: 


284 


10 


Sbjct: 


393 




Query: 


340 


15 


Sbjct: 


453 




Query: 


400 




Sbjct: 




20 










450 




Sbjct: 


573 


25 


Based on 


this 



:iSLSSGLSHQSIL?3TITTYPVLEIATKLTVGYTFS 240 

II ++ +S S L T YP+ ++A K+ VG 

: I E VNPRVS RS S ALAS KATGYPIAKMAAKIAVGLHLD 332 



KIN F+LDK LHI+E+ + L +++ AKR GF+D +A 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 841 

A DNA sequence (GBSx0893) was identified in S.agalactiae <SEQ ID 2539> which encodes the amino 
acid sequence <SEQ ID 2540>. This protein is predicted to be carbamoyl phosphate synthetase small 
subunit (carA). Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2709 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



40 >GP:CAB89872 GB:AJ132624 carbamoyl phosphate synth 

subunit [Lactococcus lactis 



Identities = 188/352 (53%) , Positives = 265/352 (74%) 

Query: 1 mKKLLIIiEDGTVFEGLSFGSSLDVTGEnVFCTGNTGYQEIITNPSHNGKIIjVFTSPLIG 60 

M+K+LLILEDGT+FEG 4 G++LDVTGELVF TG TGYQE IT+ S+NG+IL FT P++G 
Sbjct: 1 MSJCRLrjILEDGTIFEGEALGANLDVTGELVFNTGMTGYQESITDQSYNGQIIjTFTYPIVG 60 

Query: 61 NYGIHRSYSEAIIPTCLGVWAEYSRCVSSDTSICMNLDEFLKMKKVPAMSGVDTRYLMQV 120 

NYG++R E+I PTC WV E +R S+ +M+ DEFLK K +P ++GVDTR + ++ 
Sbjct: 61 NYGVNRDDYESIHPTCKAVVVHEAARRPSlWmMQMSFDEFLKSI<NIPGITGVDTRAITKI 120 

Query: 121 IKEKGFVK&TLAEAGDVLSHLQDQLIATVLPTMNV^ 180 

++E G +KA+L +A D + H QL ATVLPTN VE ST TAYPSP +GR +W+DFGL 
Sbjct: 121 WEHGTMKASLVQARDEVDHQMSQLQAn^PTNQTOTSSTATAYPSPNTGRKVVWDFGL 180 

Query: 181 KHSILRELSKRQCDVTVIPYNTSLEGIKNLYPEGIILSNGPGNPEKLQEILNTIKELQKS 240 

KHSILRELSKR+C++TV+PYNTS + I + P+G++L+NGPG+P + E + IKE+Q 
Sbjct: 181 KHSILRELSKRECM^TWPYNTSAKEILEMEPDGWLTNGPGDPTDVPEAIEMIKEVQGK 240 



60 Query: 



241 VPMLGIGLGHQLIA^™GAEIMRLPvAKKGFWPMRDIATGRLETVSQFNHFTvmImP 300 
+P+ GI LGHQL ++ANGA ++ +G N+ +R++ATGR++ SQ + + V+ NLP 
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Sbjct: 241 IPIFGICLGHQLFSLANGATTYKMKFGHRGFNHAVREVATGRIDFTSQNHGYAVSSENLE 3 00 

Query: 301 HDLLVTHEGIJS1DQEIVAI.RHRSFPVMSVQFYPEAAPGPHDVTYFFDEFLEMI 352 
DL++TH +MD 4- +RH+ FP SVQF+P+AAPGPHD +Y FD+F++++ 
5 Sbjct: 301 EDLMITHVEINDNSVEGVRHKYFPAFSVQFHPDAAPGPHDASYLFDDFMDLM 3 52 

There is also homology to SEQ ID 2030. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

10 Example 842 

A DNA sequence (GBSx0894) was identified in S.agalactiae <SEQ ID 2541> which encodes the amino 
acid sequence <SEQ ID 2542>. Analysis of this protein sequence reveals the following: 

j N-terminal signal sequence 



• Final Results 

bacterial cytoplasm Certainty=0 . 3646 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9967> which encodes amino acid sequence <SEQ ID 9968> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAB89869 GB:AJ132624 pyrimidine regulatory protein [Lactococcus 
lactis] 

Identities = 127/169 (75%), Positives = 147/169 (86%) 

Query: 13 MKRKEIIDDWMKRAITRITYEIIEraKNLDOTVLAGIKTRGVFLAKRIQERLKQLENLD 72 

M RKEIID++TMKRAITRITYEIIERNK LD +VL GIKTRGV+LAKRIQERL+QLE L+ 
Sbjct: 1 MARKEIIDEITMKRAITRITYEIIERNKELDKLVLIG1KTRGVYLAKRIQERLQQLEGLE 60 

Query: 73 IPVGELDTKPFRDDMKVEVDTTTMPVDITDKDIILIDDVLYTGRTIRAAlfjNLVSLGRPS 132 

IP GELDT+PFRDD + + DTT + +DIT KD+XL+DDVLYTGRTIRAAID +V LGRP+ 
Sbjct: 61 IPFGELDTRPFRDDKQAQEDTTEIDIDITGKDVILVDDVLYTGRTIRAAIDGIVKLGRPA 120 

Query: 133 RVSIAVLIDRGHRELPIRADYVGKNIPTSQFEEILVEVMEHDGYDRVSI 181 

RV LAVL+DRGHRELPIRADYVGKNIPT EEI+V++ EHDG D + I 
Sbjct: 121 RVQEAVLVDRGHRELPIRADYVGKNI PTGHDEEI I VQMSEHDGNDS ILI 169 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2543> which encodes the amino acid 
sequence <SEQ ID 2544>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 3870 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 147/171 (85%), Positives = 158/171 (91%) 

Query: 13 MKRKEIIDDVTMKRAITRITYEIIERNKHLDNIVIAGIKTRGVFLAKRIQERLKQLENLD 72 
MK KEI+DDVTMKRA1TRITYE1IERNK LDN+VLAGIKTRGVFLA+RIQERL QLE LD 
55 Sbjct: 1 MKTKEIVDDVTMKRAITRITYEIIERNKQLDNvVIAGIKTRGWLARRIQERLHQLEGLD 60 
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Query: 73 IPVGELDTKPFRDDMKVEVDTT r TMPVDITDKDIILIDD 1 7LYTGRTIRAA.IDNLVSLGRPS 132 

+P+GELD KPFRDDM+VE DTT M VDIT KD+ILIDDVLYTGRTIRAAIDNLVSLGRP+ 
Sbjct: 61 LPIGELDIKPFRDDMRVEEDTTLMSVDITGKDVILIDDVLYTGRTIRAAIDNLVSLGRPA 120 

Query: 133 RVSLRVLIDRGHRELPIRADYVGKNIPTSQFEEILVEVMEHDGYDRVSIID 183 

RVSLAVL+DRGHREDPIRADYVGKNIPTS EEI+VEV+E DG DRVSIID 
Sbjct: 121 RVSLAVLVDRGHREDPIRADYVGKNIPTSSVEEIWEWEVDGRDRVSIID 171 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 843 

A DNA sequence (GBSx0895) was identified in S.agalactiae <SEQ ID 2545> which encodes the amino 
acid sequence <SEQ ID 2546> (rluD). Analysis of this protein sequence reveals the following: 

Possible site: 3B 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0687 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9969> which encodes amino acid sequence <SEQ ID 9970> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06261 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 178/290 (61%), Positives = 216/290 (74%), Gaps = 2/290 (0%) 

Query: 17 GVR^KAL-ADNSELSRSQANEEIKKGIVLWGQVHCA;<yTVQEGDRITFDIPKEEVLDY 75 

G R+DK LA E SR+Q + IK G VL+NG+ K> Y V+ GD + +P+ EVL+ 
Sbjct: 15 GERIDKFLTAQGEEWSRTQVQQWIKDGHVLINGRTIKSNYKvETGDTLELFVPEPEVLEV 74 

Query: 76 QAENIPLDIIYQDDDVAWNKPQGI4WHPSAGHSSGTLVNALMYHIKDLSSINGWRPGI 135 

ENIP++IIY+D+DVAWNKP+GMWHP+ GH++GTLVNALMYH DLSSINGWRPGI 
Sbjct: 75 VPENIPIEIIYEDEDVAVVI'IKPRGMVvHPAPGHTTGTLVN7U^MYHCNDLSSINGVVRPGI 134 

Query: 136 VHRIDKDTSGLLMVAKNDRAHQVLAEELKDKKSL 195 

VHRIDKDTSGLLM+AKNDRAH+ L ttt K 4 R Y A1VHGN+P+D G I+APIGR 
Sbjct: 135 VHRIDKDTSGLLMIAKNDRAHESLVNQLKaKTTERVYQAIvHGNIPHDHGTIDAPIGRDK 194 

Query: 196 KDRKKQA.VTAK-GKPAITRFHVLERFGDYTLVELSLETGRTHQIRVHI>1AYIGHPLAGDPV 254 

DR+ VT + + A+T F VLERFGD+T VE LETGRTHQIRVH YIG PLAGDP 
Sbjct: 195 VDRQSMTVTEENSRDAVTHFTVLERFGDFTFVECQLETGRTHQIRVHFKYIGFPLAGDPK 254 

Query: 255 YGPRKTLGGKGQFLHAQTLGFTHPSNGENLIFSVEVPEIFQTTLEKLRKN 304 

YGP+KTL GQ LHAQ LGF HP GE + F VE+PE + + +L+ N 
Sbjct: 255 YGPKKTLSIDGQALHAQKLGFEHPRTGEFKRFKVEMPSEMKKLIRQLQNN 304 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2547> which encodes the amino acid 
sequence <SEQ ID 2548>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 245 5 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=D. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 239/295 (81%) , Positives = 265/295 (89%) 

Query: 9 MEITIKIAGTOLDKALADNSELSRSQBNEEIKKGIVLVNGQVKKflKYTVQEGDRITFDIP 68 

ME I + +G RLDKALAD S LSR QRN++1K+G+VLVNGQ KKAKYTVQ GD I F++P 
Sbjct: 1 MEIWITSGQRLDKALOTLSPLSRGQAMXJIKQGLVLVNGQQKKaKYTVQAGDVICFELP 60 

Query: 69 KEEVLDYQAENIPLDIIYQDDDVAVVNKEQGMVVHPSAGHSSGTLVHALMYHIKDLSSIN 128 

KEEVL+YQA+NI PLDI I Y+DD +A++NKPQGMWHPSAGH SGT+VNALMYHIKDLSSIN 
Sbjct: 61 KEEVLEYQAQNI PLDI I YEDDALAI INKPQGMWHPSAGHPSGTMVNALMYHI KDLS SIN 120 

Query: 129 GVVRPGIVHRIDKDTSGLLWAKNDRAHQVLAEELKDKKSLRKYIAIVHGNLPNDRGVIE 18B 

GWRPGIVHRIDKDTSGLLMVAK D AHQ LAEELK KKSLRKYLAIVHGNLPNDRG+IE 
Sbjct: 121 GVVRPGIVHRIDKDTSGLLMVAKTDAAHQALAEELKAKXSLRKYLAIVHGNLPNDRGMIE 180 

Query: 189 APIGRSDKDRKKQAVTAKGKPAITRFHVLERFGDYTLVELSLETGRTHQIRVHMAYIGHP 248 

APIGRS+KDRKKQAVTAKGK A+TRF VLERFGDY+LVEL LETGRTHQIRVHMAYIGHP 
Sbjct: 181 APIGRSEKDRKKQAVTAKGKEA^/TRFTVLERFGDYSLVELQLETGRTHQIRVHMAYIGHP 240 

Query: 249 LAGDPVYGPRKTLGGKGQFLHAQTLGFTHPSNGENLIFSVEVPEIFQTTLEKLRK 303 

4-AGDP+YGPRKTL G GQFLHA+TLG THP G+ +IF+VE PEIFQ L+ LRK 
Sbjct: 241 VAGDPLYGPRKTLSGHGQFLHAKTLGLTHPKTGKEMIFTVEAPEIFQKVLKLLRK 295 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 844 

A DNA sequence (GBSx0896) was identified in S.agalactiae <SEQ ID 2549> which encodes the amino 
acid sequence <SEQ ID 2550>. Analysis of this protein sequence reveals the following: 

i N-tcrminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 0496 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GE:AAD53064 GB:AF163833 CpsY [Streptococcus agalactiae] 

3 = 105/297 (35%) , Positives = 163/297 (54%) , Gaps = 4/297 (1%) 

MNIQQLRYWAIANSGTFREAAAKLFVSQPSLSVAVRDLETELGFQI FTRTTTGA VLTNQ 6 0 
M IQQL+YV+ I +G+ EAA +L+++QPSLS AVR+LETE+G QIF R G LT 
MRIQQLQYVIKIVETGSMNEAAKQLYITQPSLSNAVRNLETEMGIQIFIRNPKGITLTKD SO 

GMTFYENALEWKSFDSFEKQFSQSEATEQEFSIASQHYDFLPPLITAFSKCNDNFSY-F 119 
GM F A ++++ E+++ + + FS++SQHY F+ A D Y 



Ident: 








Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


120 


Sbjct: 




Query: 


180 


Sbjct: 


181 


Query: 




Sbjct: 


241 



+PIA+4 L M E 



60 A related DNA sequence was identified in S.pyogenes <SEQ ID 255 1> which encodes the amino acid 
sequence <SEQ ID 2552>. Analysis of this protein sequence reveals the following: 
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d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1252 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/296 (73%) , Positives = 253/296 (85*) 

Query: 1 MNIQQLRYWAIANSGTFREAAAKLFVSQPSLSVAVRDLETELGPQIFTRTTTGAVLTNQ 60 

MNIQQLRYWAIAN+GTFREAA+KLFVSQPSLSV+++DLE ELGFQIF RTT+G VLT+Q 
Sbjct: 1 MNIQQLRYWAIANNGTFREAASKLFVSQPSLSVSIKDLEAELGFQ1FNRTTSGTVLTSQ 60 

Query. 61 GMTFYENALEWKSFDSFEKQFSQSEATEQEFSIASQHYDFLPPLITAFSKCNDNFSYFR 120 

G+ FYE ALEWKSFDSFEK FSQ++ + EFSIASQHYDFLPPLITAFS+ D FR 
Sbjct: 61 GLVFYEKALEWKSFDSFEKTFSQAELDQNEFSIASQHYDFLPPLITAFSQQYDGHRVFR 120 

Query: 121 IFESTTIRILDEVAQGNSEIGIIYINSQNKKGLLQRLDKLGLEFVELIPFKTHIYLGKDH 180 

IFESTTI+ILDEVAQGNSEIGI I Y+N N+KGL QR+DKLGLE+V LIPF THIYL K H 
Sbjct: 121 IFESTTIQILDEVAQGNSEIGI1YLNVDNQKGLFQRMDKLGLEYVSLIPFTTHIYLSKTH 180 

Query: 181 PIASKTSLIMTDLEGLPTTOFTQDRDDYRYYSENF\ r EVLDSSVTYNVTDRATIjNGILERT 240 

PLA++ +L + D++GLP VRFTQ+RD+Y YYSENFV+ + YNV+DRATLNGILERT 
Sbjct: 181 PLMTREALYLNDIQGLPAVRFTQERDEYLYYSENFVDTSECPRIYNVSDRATIiNGILERT 240 

Query: 241 QAYATGSGFLDSRSTOGITVIPLEDHIjDNQMIYIKRKDRNLSQMALKFVAVMEEYF 296 

A+ATGSGFLD RSVNGI VIPL DH+DNQMrY+KRKD+NLS FV ++++YF 

Sbjct: 241 NAFATGSGFLDIIRSTOGIKVIPI^HIDNQMIYVKRKDIOJLSUAGATFVTILKDYF 296 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 845 

A DNA sequence (GBSx0897) was identified in S.agalactiae <SEQ ID 2553> which encodes the amino 
acid sequence <SEQ ID 2554>. This protein is predicted to be 50S ribosomal protein L27 (rpmA). Analysis 
of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0976 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 NLQLFAHKKGGGSTSNGRDSQAKRLGAKAADGQTVSGGSrLYRQRGTHIYPGANVGRGGD 67 

+LQ FA KKG GST NGRDS+AKRLGAK ADGQ V+GGSILYRQRGT IYPG NVGRGGD 
Sbjct: 5 DLQFFASKKGVGSTKNGRDSEAKRLGAKRADGQFVTGGSILYRQRGTKIYPGEWGRGGD 64 

Query: 68 DTLFAKVEGWRFERKGRDKKQVSVYPIAK 97 

DTLFAK++G V+FER GRD+K+VSVYP+-A+ 
Sbjct: 65 DTLFAKIDGTVKFERFGRDRKKVSVYPVAQ 94 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2555> which encodes the amino acid 
sequence <SEQ ID 2556>. Analysis of this protein sequence reveals the following: 
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Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0.0976 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) <: suco 

An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 95/97 (97%), Positives = 96/97 (98%) 

Query: 1 MLKMN]^ANLQLFAHKKGGGSTSNGPI3SQAKRLGAKAADGQTVSGGSILYRQRGTH^ 60 

MLKMNIiANLQLFAHKKGGGSTSNGRDSQiiKRLGAKAADGQTVSGGSIL 
Sbjct: 1 MLKMNLANLQLFAHKKGGGSTSNGRDSQAKRLGAKRADGQTVSGGSILYRQRGTHIYPGV 60 

15 

Query: 61 NVGRGGDDTLFAKVEGVVRFERKGRDKKQVSvYPIAK 97 

NVGRGGDDTLFAKVEGWRFERKGRDKKQVSVYP+AK 
Sbjct: 61 NVGRGGDDTLFAKVEGWRFERKGRDKKQVSVYPVAK 97 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 846 

A DNA sequence (GBSx0898) was identified in S.agalactiae <SEQ ID 2557> which encodes the amino 

acid sequence <SEQ ID 2558>. Analysis of this protein sequence reveals the following: 

25 Possible site: 25 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 32 - 48 ( 32 - 48) 

Final Results 

30 bacterial membrane — Certainty=0 .1298 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:BAB06729 GB:AP001517 unknown conserved protein in B. subtilis 

[Bacillus halodurans] 
Identities = 33/107 (30%) , Positives = 63/107 (58%) , Gaps = 4/107 (3%) 

Query: 1 MI KATFTRNQSGYLYSAEI SGHAGSGE YGFDVI CAAVSTLS INFINSLEALTTCQAQLI I 60 
40 MI F RN+ + S +SGHA +G YG D++CA S +++ +N++ AL CQ , +L+ 

Sbjct: 1 MIDWFERNKQM3IVSFTMSGRADAGPYGQDLVCAGASAVALGTVNAIIAL- -CQVELVT 58 

. Query: 61 N-DVEGGYMKIDL-SSIPQHKEDKVQLLFESYLLGKTNLSKDSSEFV 105 
+ EGG+++ + + + + +KVQLL E + + ++++ E + 
45 Sbjct: 59 EMENEGGFLRCRVPNDLEETTFEKVQLLLEGMNISLQSIAESYGEHI 105 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2559> which encodes the amino acid 
sequence <SEQ ID 2560>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
50 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 32 - 48 ( 32 - 48) 

Final Results 

bacterial membrane Certainty=0. 1235 (Affirmative) < suco 

55 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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>GP:BAB06729 GB:AP001517 -unknown conserved protein in B. subtilis 
[Bacillus lialodurans] 
Identities = 33/109 (30%), Positives = 60/109 (54%), Gaps = 4/109 (3%) 

Query: 1 MIKAIFTRQKNGQLSSvTLTGHAGSGKHGFDIVCaSVSTIAINFVNSIiEvljADCQMjVDL 60 

MI +F R K 4 S T44GHA +G +G D+VCA S +A+ VN44 h 4 4 44 
Sbjct: 1 MIDVVFERNKQOTJIVSFTMSGHMJAGPYGQDLVCAGASAVALGTVNAIIALCQVELVTEM 60 

Query: 61 NDVEGGYMAITI P PHDNKEEVQLLFESFLLGMTSLAKDSSKFVNTQ 106 

4- EGG44 +P E+VQLL E + + S+A+ + + 4 

Sbjct: 61 EN-EGGFLRCRVPNDLEETTFEKVQLLLEGMKISLQSIASSYGEHIQIE 108 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 67/110 (60%), Positives = 90/110 (80%), Gaps = 2/110 (1%) 

Query: 1 MI KATFTRNQSGYLYSAE I SGHAGSGEYGFDVI CAAVSTLS INFINSLEALTTCQAQLI I 60 

MIKA FTR ++G L S 4+GHAGSG++GFD++CA+VSTL+INF+NSLE L CQA 4 4 
Sbjct: 1 MIKAIFTRQKNGQLSSOTLTGHAGSGKHGFDIVCASVSTLAINFVNSLEVLADCQALVDL 60 

Query: 61 NDVEGGYMKIDLSSIPQHKEDKVQLLFESYLLGMTNLSKDSSEFVSTWM 110 

NDVEGGYM I 4 P 444VQLLFES4LLGMT4L4KDSS4FV4T V4 
Sbjct: 61 NDVEGGYMAITI P - - PHDNKEEVQLLFESFLLGMTSIAKDSSKFVNTQVI 108 

SEQ ID 2558 (GBS433) was expressed in E.coJi as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 78 (lane 4; MW 16kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 8; MW 41kDa). 

GBS433-GST was purified as shown in Figure 223, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



30 Example 847 

A DNA sequence (GBSx0899) was identified in S.agalactiae <SEQ ID 2561> which encodes the amino 
acid sequence <SEQ ID 2562>. This protein is predicted to be ribosomal protein L21 (rplU). Analysis of 
this protein sequence reveals the following: 

Possible site: 57 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2972 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 4 YAIIKTGGKQVKVEVGQAIYVEKLDVHAGAEWFNEVVIjVGGETTKVGTPVVEGATWGT 63 

YAIIKTGGKQ4KVE GQ +Y4EKL EAG VTF 4V4 VGG+ KVG P VEGATV 
Sbjct: 2 YAIIKTGGKQIKVEEGQTWIEKIA I \FAGETWFEDVLFV3GDNVWGNP1VEGATVTAK 61 

Query: 64 VEKCGKQKKWSYKYKPKKGSHRKQGHRQPYTKWINAINA 104 

VEKQG4 KK4 44YKPKK H4KQGHRQPYTKV I INA 
Sbjct: 62 VEKQGRAKKITVFRYKPKKNVHKKQGHRQPYTKVTIEKINA 102 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2563> which encodes the amino acid 
55 sequence <SEQ ID 2564>. Analysis of this protein sequence reveals the following: 
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j N-terminal signal 

Final Results 

bacterial cytoplasm Certaijity=0 . 3026 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/104 (93%) , Positives = 101/104 (96%) 

Query: 1 MSTYAIIKTGmQWVEVGQAIYVEKlDvEACSAEVTFNE^ 60 

MSTYAIIKTGGKQVKVEVGQAIYVEK+D SAGAEVTFNEWLVGG+ T VGTPWEGATV 
Sbjct: 1 MSTYAIIKTGGKQVKOTVGQAIYVEKIDAEAt^VTFNEVVliVGGDKTWGTPVVEGATV 60 

Query: 61 VGTVEKQGKQKKWSYKYKPKKGSHRKQGHRQPYTKVVINAINA 104 

VGTVEKQGKQKKW++KYKPKKGSHRKQGHRQPYTKWINAINA 
Sbjct: 61 VGTVEKQGKQKKWTFKYKPKKGSKR.KQGHRQPYTKWINAINA 104 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 848 

A DNA sequence (GBSx0900) was identified in S.agalactiae <SEQ ID 2565> which encodes the amino 
acid sequence <SEQ ID 2566>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1032 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9369> which encodes amino acid sequence <SEQ ID 9370> 
was also identified. 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14809 GB:Z99118 excinuclease ABC (subunit C) [Bacillus subtilis] 
Identities = 221/373 (59%) , Positives = 288/373 (76%) 



Query: 




Sbjct: 


206 




61 


Sbjct: 


266 


Query: 


121 


Sbjct: 


326 


Query: 


181 


Sbjct: 


386 


Query: 


241 


Sbjct: 


446 


Query: 


301 



GKLI+RDV+MFP Y E +E+FLT+ IGQFY HFLPKE+ +P ID +E ++ 



+P++G KK+L+ LA KNA+++L++KF L+E+D ++ GA++ LG LNI P RI AFD 
HQPKKGPKKELLMLAHKNAKIALKEKFSLIERDEERSIGAVQKLGEALNIYTPHRIVAFD 385 

NSNIQGTSPVAAIWVFVNGKPSKKDYRKFKIKTVIGPDDYASMREVIHRRYSRVLKDGLT 240 
NSNIQGT+PV+AM+VF++GKP KK+YRK+KIKTV GPDDY SMREV+ RRY+RVL++ L 
NSNICjGTNPVSAMIVFIDGKPYKKEYRKYKIKT\^GPDDYGSMREWRRRYTRVLRENLP 445 

PPDLIVIDGGCjGQvNIARDVIFjNQFGIJ^PIAGLQKiroKHQTHELLFGDPLEVVELPRNS 300 

PDLI+ IDGG+GQ+N ARDVIEN+ GL IPIAGL K++KH+T LL GDPLEV L RNS 
LPDLIIIDGGKGQINAI^DVIEIffil^LDIPIAGIiAKDEKHRTSNLLIGDPLEVAYI.ERNS 505 

Query: 301 EEFFLLHRIQDEVHRFAITFHRQLRSKNSFSEKLDGITGDGPKRKQLLMKHFKSLPNIQK 360 
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Query; 361 AEIEDIIMCGIPR 373 

A +EDI G+P+ 
Sbjct: 566 ASLEDIKKAGVPQ 578 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2567> which encodes the amino acid 
sequence <SEQ ID 2568>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 43 3 2 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 289/385 (75%) , Positives = 334/385 (86%) 

Query: 1 MKSAAMTMEFERAAEYRDLIEAISLLRTKQRVIHQDMKDRDVFGYFVDKGWMCVQVFPVR 60 

M +A+ M FERAAEYRDLI 1+ +RTKQRV+ +D++DRD+FGY4VDKGWMCVQVFFVR 
Sbjct: 206 MIjAASI<EMAFERAREYRDLISGIATMRTKQRWSKDLQDRDIFGyYVDKGVMCVQVFFVR 265 

Query: 61 NGKLIQFJJvNMFPYYNEPEEDFLTYIGQFYQDTKHFLPKEVFIPQDIDAKSVETIVGCKI 120 

GKLIQRDVN+FPYY + EEDFLTY+GQFYQD +HF+PKSVFIP+ ID + V IV KI 
Sbjct: 266 QGKLIQRDVNLFPYYTDAEEDFLTYMGQFYQDKQHFIPKEVFIPEAIDEELVAAIVPTKI 325 

Query: 121 VTCPQRGEKKQLVNLAIKNARVSLQQKFDLIJSKDIRKTH^ 180 
+KP+RGEKKQLV LA KNARVSLQQKFDIiLEKDI+KT GAIENLG lib I KPVRIEAFD ; 
. Sbjct: 326 IKPKKGEKKQLVAIATKNARVSLQQKFDLI^KDIKKTSGAIENLGQLLRIDKPVRIEAFD 385 

Query: 181 NSNIQGTSPVAAMWFVNGKPSKKDYRKFKIKTVIGPDDYASMREVIHRRYSRVLKDGLT 240 

NSNIQGTSPVAAMWFV+GKPSKKDYRKFKIKTV+GPDDYASMREV+ RRYSRV K+GL 
Sbjct: 386 NSNIQGTSPVAAMWFVDGKPSKKDYRKFKIKTWGPDDYASMREVLFRRYSRVKKEGLQ 445 

Query: 241 PPDLIVIDGGQGQVNIARDVIENQFGLAIPIAGLQKNDKHQTHELLFGDPLEVVELPRNS 300 

P+LI++DGG GQVN+A+DVIE Q GL IP+AGLQKNDKHQTH+IiLFG+PLEW LPR S 
Sbjct: 446 APNLIIVDGGVGQVNVAKDVIEKQLGLTIPVAGLQKNDKHQTHDLLFGNPLEWPLPRRS 505 

Query: 301 EEFFLLHRIQDEVHRFAITFHRQLRSKNSFSSKLDGITGLGPKRKQLLMKHFKSLPNIQK 3 60 

EEFFLLHRIQDEVHRFA+TFHRQ+R KNSFSS LD I+GLGPKRKQLL++HFK++ I 
Sbjct: 506 EEFFLLHRIQDEVHRFAVTFHRQVRRKNSFSSTLDHISGLGPKRKQLLLRHFKTITAIAS 565 

Query: 361 AEIEDIIMCGIPRTVAESLRDSLND 385 

A E+I GIP+TV E+++ + D 
Sbjct: 566 ATSEEIQALGIPKTWEAIQQQITD 590 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 849 

A DNA sequence (GBSx0901) was identified in S.agalactiae <SEQ ID 2569> which encodes the amino 
acid sequence <SEQ ID 2570>. Analysis of this protein sequence reveals the following: 

■I-terminal signal sequence 



■ Final Results 

bacterial cytoplasm Certainty=0. 2491 (Affirmative) • 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 850 

A DNA sequence (GBSx0902) was identified in S.agalactiae <SEQ ID 2571 > which encodes the amino 
acid sequence <SEQ ID 2572>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3349 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA86651 GB.-AB033763 glycerophosphoryl diester phosphodiesterase 
homologue [Staphylococcus aureus] 
20 Identities = 50/202 (24%) , Positives = 96/202 (46%) , Gaps - 15/202 (7%) 



25 



Query: 


1 

36 


MDVIMTKDHKLWIHDDNLKELSGI'WKDVSKIiTIjDQVTKIPIHQ GRFA- SHIPSFTE 

+DV +TKD +L++IHDD L+R + M+ ++++L D++ +F H+P+F + 


56 


Sbjct: 
Query: 


57 


LDVAITKDEQLIIIHDDYLERTTNMSGEITELNYDEIKDASAGSWFGEKFKDEHLPTFDD 
FMKTAQSLDQKIMIELKPY-NQNL.DIYADEFIKEFKE LRLSTKHKVMSLNLTLIEK 


95 
111 






+K A + + +ELK N + +K+ +E L + + + S N+ L++ 




Sbjct: 


96 


VVTCIANE YNMNIiNVELKG ITGPNGLALS KSMVKQVEEQLTNLNQNQEVL I SSFNWLVKL 


155 




112 


VEKKLPQLDTGYLI PI , HWGTLQNH -NVDFYGIEEFSYNDWIAYLAQEYNKQLYVW 


165 






E+ +PQ + + W TL ++ N E+ + +E +L VW 




Sbjct: 


156 


AEEIMPQYNRAVIFHTTSFREDWTLLDYCNAKIVNTEDAKLTKAKVKWKEAGYELNVW 


215 


Query: 


166 


TINRDNLMIRYLQSPVNGIITD 187 








T+N+ + V+GI TD 




Sbjct: 


216 


TVNKPARANQLANWGVDGIFTD 237 





A related DNA sequence was identified in S.pyogenes <SEQ ID 2573> which encodes the amino acid 
sequence <SEQ ID 2574>. Analysis of this protein sequence reveals the following: 

40 Possible site: 36 



45 





have no N-terminal s 


Lgnal sequence 










INTEGRAL 


Likelihood =-12 


26 


Transmembrane 


239 


255 


227 


260 


INTEGRAL 


Likelihood = -9 


45 


Transmembrane 




96 


78 




INTEGRAL 


Likelihood = -9 


13 


Transmembrane 


137 


153 


131 


160 


INTEGRAL 


Likelihood = -4 


94 


Transmembrane 


278 


294 


277 


295 


INTEGRAL 


Likelihood = -3 


56 


Transmembrane 






33 


55 


INTEGRAL 


Likelihood = -3 


SS 


Transmembrane 


188 


204 


185 


206 


INTEGRAL 


Likelihood = -3 


35 


Transmembrane 


314 


330 


310 


331 



50 Final Results 

bacterial membrane --- Certainty=0. 5904 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

>GP:CAB12801 GB : Z99109 similar to glycerophosphodiester 
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phosphodiesterase [Bacillus subtilis] 
Identities = 67/244 (27%) , Positives = 110/244 (44%) , Gaps = 14/244 (5%) 

Query: 344 VIAHRGLVSAGVENSLE^EGAKKAGSDYVELDLILTKDNHFWSHDNRLKRLAGVNKTI 403 

+ IAHRG EN++ A + A K +D +ELD+ LTKD W HD+R+ R + + 

Sbjct: 3 IIAHRGASGYAPENTIAAFDIAVKMNADMIELDVQLTKDRQIWIHDDRVDRTTNGSGEV 62 

Query: 404 RNLTLKEVEHLTSHQGH FSGRFVSFDTFYQKAKKLNMPLLIELKPIGTEPGNYVDLF 460 

++ TL+E++ L+ + FG+ K + LLIELK ++ G ++ 

Sbjct: 63 KDFTLEELQKLDAGSWYGPAFQGERIPTLEAVLKRYHKKIGLLIELKGHPSQVGIEEEVG 122 

Query: 461 LETYHRLGISKDNKVMSLDLEVIEAIKKKNPSITTGYIIPIQFGFFG DEFVDF 513 

+ + S +N V S ++ ++ PSI T I FG F ++ 

Sbjct: 123 -QLLGQFSFSIMMIVQSFQFRSVQRFRELYPSIPTAVITRPNFGMLSRMQMKAFRSFANY 181 

Query: 514 YVIEDFSWSYLSSQAFWNNI<EIYVV>ITINDP!CRIEHYLLKPIQGIITDQPALTNQIjIKDL 573 

1+ + N 1+ VJT+N+ K + GI+TD P + +IKD 
Sbjct: 182 WIKHTRLNRLMIGS INKNGLNI FAWrVHNQKTAAKLQAMGVDGIVTDYP DFI I KDG 238 

Query: 574 KQDN 577 
K +N 

Sbjct: 239 KHEN 242 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/215 (41%) , Positives = 136/215 (62%) 

Query. 1 MDVIMTKDHKLWIHDDNLKRLSGMNKDVSKLTLDQVTKIPIHQGRFASHIPSFTEFMKT 60 

+D+I+TKD+ W HD+ LKRL+G+NK + LTL +V + HQG F+ SF F + 
Sbjct: 375 LDLILTKDNHFWSHDTOLKRLAGVNKTIRNIiTLKEVEHLTSHQGHFSGRFVSFDTFYQK 434 

Query: 61 AQSLDQKIMIELKPYNQNLDIYADEFIKEFKELRLSTKHKVMSINLTLIEKVEKKLPQLD 120 

A+ L+ ++IELKP Y D F++ + L +S +KVMSL+L +IE ++KK P + 

Sbjct: 435 AKKI^WPLLIELKPIGTEPGbreVDLFLETYHRLGISKDNKVMSIjDLEVIEAIKKKNPSIT 494 

Query: 121 TGYLIPLHWGTLQNHWTOFYGIEEFSYHDWIAYLAQEYNKQLYVWTINRDNLMIRYLQSP 180 

TGY+IP+ +G + VDFY IE+FSY +++ A NK++YVWTIN + YL P 
Sbjct: 495 TGYIIPIQFGFFGDEFVDFYVIEDFSYRSYDSSQAFWNNJCEIYVWTINDPKRIEHYLLKP 554 

Query: 181 VNGIITDEuWLFKVINKDIKNSPNYYQRALQLIDS 215 

+ GIITD+ L + KD+K 4Y+ R +++I S 
Sbjct: 555 I QGI ITDQPALTNQLI KDLKQDNSYFSRLVRI ISS 589 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 



Example 851 

A DNA sequence (GBSx0903) was identified in S.agalactiae <SEQ ID 2575> which encodes the amino 
acid sequence <SEQ ID 2576>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Likelihood =-15.02 Transmembrane 84 - 100 ( 76 - 112) 

Likelihood = -3.50 Transmembrane 139 - 155 ( 139 - 157) 

Likelihood = -2.23 Transmembrane 41 - 57 ( 39 - 59) 

Likelihood = -0.96 Transmembrane 179 - 195 ( 179 - 195) 

Final Results 

bacterial membrane Certainty=0 . 7007 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 990 1> which encodes amino acid sequence <SEQ ID 9902> 
was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 2574. 

A related GBS gene <SEQ ID 8671> and protein <SEQ ID 8672> were also identified. Analysis of this 
protein sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: -3.38 
GvH: Signal Score (-7.5) : -4.08 

Possible site: 53 
i>> Seems to have no N-terminal signal sequence 
10 ALOM program count: 4 value: -15.02 threshold: 0.0 

INTEGRAL Likelihood =-15.02 Transmembrane 84 - 100 ( 76 - 112) 
INTEGRAL Likelihood = -3.50 Transmembrane 139 - 155 ( 139 - 157) 
INTEGRAL Likelihood = -2.23 Transmembrane 41 - 57 ( 39 - 59) 
INTEGRAL Likelihood = -0.9S Transmembrane 179 - 195 ( 179 - 195) 
15 PERIPHERAL Likelihood = 2.01 104 

modified ALOM score: 3.50 

*** Reasoning Step: 3 

20 Final Results 

bacterial membrane Certainty=0 . 7007 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 852 

A DNA sequence (GBSx0904) was identified in S.agalactiae <SEQ ID 2577> which encodes the amino 
acid sequence <SEQ ID 2578>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



■ Final Results 

bacterial cytoplasm Certainty=0. 4150 (Affirmative) • 

bacterial membrane certainty=0 . 0000 (Not Clear) < i 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 853 

A DNA sequence (GBSx0905) was identified in S.agalactiae <SEQ ID 2579> which encodes the amino 
acid sequence <SEQ ID 2580>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 2 - 18 ( 2-18) 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 854 

A DNA sequence (GBSx0906) was identified in S.agalactiae <SEQ ID 2581> which encodes the amino 
acid sequence <SEQ ID 2582>. This protein is predicted to be nad(p)h nitroreductase ydgi. Analysis of this 
protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.81 Transmembrane 127 - 143 ( 12S - 143) 

Final Results 

bacterial membrane Certainty=0 . 1723 (Affirmative) suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Sbjct: 

Query: 60 GLPESNCNQINQAQYVIALFTDTD LGQRSRKIARIGRRSLPDDLIGYYMETLPPRY 115 

L N Q+ + VIA+F D + L + K +G +P ++ + L + 
Sbjct: 67 -IASFNQTQVTTSSAVIAVFADMNNADYLEEIYSKAVELG--YMPQEVKDRQIAALTAHF 123 

Query: 116 ALYSEKQTGEYLSIjNAGIVAMNLVLALTDQGISSNMILGFDI<AITNDVLEIDK-RFRPEI 174 

+ E + ++ G+V+M L+L G +N I G+DK + +DK R+ P + 

Sbjct: 124 EKLPAQVNRETILIDGGLVSMQLMLTARAHGYDTNPIGGYDKENIAETFGLDKERYVPVM 183 

Query: 175 LITVGYSDEKVEPSYRLPVDHI IE 198 

L+++G + ++ SYRLP+D I E 
Sbjct: 184 LLSIGKAADEGYASYRLPIDTIAE 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2583> which encodes the amino acid 
sequence <SEQ ID 2584>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-t 

INTEGRAL Likelihood = -2.18 Transmembrane 127 - 143 ( 126 - 143) 

Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

?GP:CAC09964 GB:AX033132 unnamed protein product [Bacillus subtilis] 
Identities = 63/204 (30%) , Positives = 109/204 (52%) , Gaps = 11/204 (5%) 



55 



Query: 3 FLEIjNKKRHAIKTFNDQ-PvDYEDLRTAIEIATIjAPSANNIQPWKFVVV'Q- -EKKAELAK 59 
F+E+ K R +1+ ++ + E++ +E AT APS+ N QPW+F+V+ E K +LA 
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FMEIMKGRRSIENYDPAVKISKEEMTEILEEATTAPSSVNAQPWRFLVIDSPEGKEKIA- 65 



■ 1+ G+V+M L+L+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 157/200 (78%) , Positives = 184/200 (91%) 

MKFLELNKKRHAVTCHEbTOKP^ 60 
MKFLELNKKRHA+K FND+PVD++D+RTAIEIATLAPSANNIQPWKFWVQEKK+ LA+G 
MKFLELNKKRHAIKTFNDQPVDYEDLRTAIEIATIAPSA1(INIQPWKFVVVQEKKAELAKG 6 0 

LPESNCNQINQAQYVIALFTDTDLGQRSRKIARIGRRSLPDDLIGYXMETLPPRYALYSE 120 
LP 4-N Q+ QAQYV+ALF+DTDL RSRKIARIG +SLPDDLIGYYMETLPPR+A ++E 



Sbjct: 


7 


Query: 


60 


Sbjct: 


66 




116 


Sbjct: 


124 


Query: 


175 


Sbjct: 


184 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 






Sb j ct : 


121 




181 


Sbjct: 


181 



SDEK EPSYRLPVD +IE+R 
SDEKPEPSYRLPVDEVIERR 200 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 855 

A DNA sequence (GBSx0907) was identified in S.agalactiae <SEQ ID 2585> which encodes the amino 
acid sequence <SEQ ID 2586>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2B95 (Affirmative) < suco 

bacterial membrane — Certair.ty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45369 GB:U78036 dipeptidase [Lactococcus lactis] 
Identities = 312/474 (65%) , Positives = 370/474 (77%) , Gaps = 11/474 (2%) 

Query: 2 TIDFRAEVDKRKDALMDDLII^LRINSERDDSQADAEHPFGPGPVKALEFFLEMAERDGY 61 

TIDF+AEV+KRKDALM+DL +LLRI+S D ADAE+PFGPGP KAL+ FL++AERDGY 
Sbjct: 3 TIDFKAEVEKRI05ALMEDLFSLLRIDSAMDMEHADAENPFGPGPRKALDAFLKIAERDGY 62 

Query: 62 ETKNVDNYAGHFTFGQGE-- 
TKN UNY GHF + G 

Sbjct: 63 

Query: 118 DDKGPTMACYYALKIIKELGLPTSKKVRFVVGTDEESGWGDMDYYFEHVGLPKPDFGFSP 177 

DDKGPT+ACYYALKI+KEL LP SKK+RF+VGT+EE+GW DMDYYFEH LP PDFGFSP 
Sbjct: 123 DDKGPTVAOTALKILKELNLPLSKKIRFIVGTNEETGWADMDYYFEHCELPLPDFGFSP 182 
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Query: 178 DAEFPIINGEKGNITEYLHFSGENKGAVRLHSFSGGLREHMVPESATARFTSHLDQTTLG 237 

DAEFPIINGEKGNITEYLHFSG+N G V LHSF GL ENMVPESATA + D L 
Sbjct: 183 DAEFPIINGEKGNITEYLHFSGKNAGQVVLKSFKAGLAENMVPESATAVISGAKD IiE 239 

Query: 238 ASLADFASKH NLKAELSVEDEQYTATVYGKSAHGSTPQSGVNGATYLALYLSQFDFE 294 

A+L F ++H NL+ +L D + T T+YGKSAHG+ P++G+NGATYL L+L+QFDF 
Sbjct: 240 AALEKFVAEHASKNLRFDLEEADGKATITLYGKSAHGAMPEKGINGATYLTLFLNQFDFA 299 

Query: 295 GPA^FLDVTANIIHEDFSGEKLGVAYEDDCMGPLSMMAGVFQFDEraDDNTIAIiNFRYP 354 

A AF+ V A + ED GEKLG A+ D+ M SMMAGV+ FDE N + IALNFR+P 
Sbjct: 300 DGAAAFIWGAEKLLEDHEGEKLGTAFVDEI^EKTSNMAGWSFDE-NGEGKIALNFRFP 358 



Query: 415 GGGTFGRLLERGVAYGAMFPGDENTMHQANEYMPLENIFRSAAIYAEAIYELIK 468 

GGGTFGRLLERGVAYGAMF G+ ++MHQANE P+ENI+++A IYAEAIYEL K 
Sbjct: 419 GGGTFGRLLERGVAYGAMFEGEPDSMHQANEMKPVENIYKAAVIYABAIYELAK 472 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2587> which encodes the amino acid 
sequence <SEQ ID 2588>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3107 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suoo 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 361/467 (77%) , Positives = 403/467 (85%) 



Query: 


2 


Sbjct: 


20 


Query: 


62 


Sbjct: 


80 




122 


Sbjct: 


140 




182 


Sbjct: 


200 


Query: 


242 


Sbjct: 


260 


Query: 


302 


Sbjct: 


320 




362 


Sbjct: 


380 




422 


Sbjct: 


440 



TIDF+AEVDKRK A++ DL++LLRINSERDD AD +HPFGPGPVKALE FL MAERDGY 



PTMACYYALKIIKELGLP SKKVRF+VGTDEESGWGDMDYYF H GL PDFGFSPDAEF 



T+ GKSAHGSTP+ GVNGAT LA +L+QF FEG A+ +L 



++HEDF+ EKLG+AY DD MG LSMNAGVF FD + DNT I ALNFRYP+GTDA T 



LEKL G+ KV+LS+HEHTPHWPMDDELV+TLLAVYEKQTGLKG+EQVIGGGTFGR 



LLERGVA+GAMFPGDENTMHQANEYMPLENI+RSAAIYAEAIYELIK 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 856 

5 A DNA sequence (GBSx0908) was identified in S.agalactiae <SEQ ID 2589> which encodes the amino 
acid sequence <SEQ ID 2590>. Analysis of this protein sequence reveals the following: 

N-terminal signal 



10 Final Resulte 

bacterial cytoplasm Certainty=0 . 5598 (Affirmative) < succ: 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the QENPEPT database. 

>GP:AAC21888 GB:U32707 H. influenzae predicted coding region 
HI0220.2 [Haemophilus influenzae Ed] 
Identities = 123/192 (64%), Positives = 160/192 (83%), Gaps - 1/192 (0%) 



Query: 


1 


Sbjct: 


21 




61 


Sbjct: 


81 


Query: 


121 


Sbjct: 


141 


Query: 


181 


Sb j ct : 


200 



- +L++I +1 +D QN4-++TE GI PLF+APKTARIN1VGQAPGLK +++RLYW DKSG 



DRLR+WLGVD + FY+SG FAVLP+DFYYPG GKSGDL PR+GFAE+WHP-ML +PN+Q 



LT+L+GQY QKYYL + N+T TVK Y+ +LP ++PLVHPSPRNQ+W+ KNPWFE+ H 



A related DNA sequence was identified in S.pyogenes <SEQ ID 259 1> which encodes the amino acid 
sequence <SEQ ID 2592>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 3740 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/189 (64%) , Positives = 150/189 (78%) 

Query: 4 LEKIIKAIKSDSQNQNYTENGIDPLFAAPKTAR1NIVGQAPGLKTQEARLYWKDKSGDRL 63 

++ + KAI +D H +YTE GI PL+ AP+TARI IVGQAPG+ Q +LYW D+SG RL 
Sbjct: 1 MDDLTKAIMADEAMiSYTERGIFPLYDAPC/TARIIIVGQAPGIVACGTKLYWNDRSGIRL 60 

Query: 64 RQWLGVDEETFYHSGKFAVLPLDFYYPGKCSKSGDLSPRKGFAEKWHPLILKEMPNVQLTL 123 

R WLGVD +TFYHSG F ++P+DFYYPGKGKSGDL PR+GFA KWHP + MP V+LT+ 
Sbjct: 61 RDWLGVDNDTFYHSGLFGIIPMDFYYPGKGKSGDLPPREGFAAKWHPPLRALMPEVELTI 120 

Query: 124 LVGQYTQKYYLGSSAHKNLTETVKAYKDYIjPDYIjPLVH^ 183 

LVG+Y Q +YLG+ A+K LTETV+ ++DYLPDY PLVHPSPRNQ+WL KNPWFE+DL+ 
Sbjct: 121 LVGRYAQDFYLGNKAYKTLTET^'RHFEDYLPDyFPLvHPSPRNQLWLAKNPWFEQDLLPI 180 
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Query: 184 LQKIVADIL 192 

LQK V IL 
Sbjct: 181 LQKRVEAIL 189 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 857 

A DNA sequence (GBSx0909) was identified in S.agalactiae <SEQ ID 2593> which encodes the amino 
10 acid sequence <SEQ ID 2594>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm — Certainty=0.4l78 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 858 

A DNA sequence (GBSx0910) was identified in S.agalactiae <SEQ ID 2595> which encodes the amino 
25 acid sequence <SEQ ID 2596>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm — Certainty=0. 2779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

A related GBS nucleic acid sequence <SEQ ID 9899> which encodes amino acid sequence <SEQ ID 9900> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35886 GB:AE001748 conserved hypothetical protein [Thermotoga maritima] 
Identities = 36/124 (29%) , Positives = 58/124 (46%) , Gaps = 3/124 (2%) 

40 Query: 19 VPTKELLADYFNRMEFAIGR\'EAHVLAHFDyGFRKLNLDTODLKPFETQLKRIFIKMLSK 78 
+P EL DY R F + RV+ H LAH DY R D K +++I + ++ 

Sbjct: 98 LPPDELARDYLERTliFVMERVKFHTLAHLDYPARYAKAD FKANRDLIEKILVFLVKN 154 

' Query: 79 GIAFEUNTKSLYLYGMKLYRYALEILKQLGCKQYSIGSDGHIPEHFCYEFDRLQGLLKD 138 
45 A E+NT L+ +G + +E+ LG + +IGSD H +H + + LK 

Sbjct: 155 EKALEINTAGLFKSGKPNPDYWIVEICrro«3GRVVTIGSDAHESQHIGRGIEEVMRELKK 214 

Query: 139 YQID 142 

50 Sbjct: 215 FNFE 218 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 859 

A DNA sequence (GBSx0911) was identified in S.agalactiae <SEQ ID 2597> which encodes the amino 
acid sequence <SEQ ID 2598>. This protein is predicted to be alkaline amylopullulanase (pulA). Analysis of 
this protein sequence reveals die following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.08 Transmembrane 1225 -1241 (1222 -1247) 
INTEGRAL Likelihood = -2.44 Transmembrane 19 - 35 ( 18 - 36) 
INTEGRAL Likelihood = -0.11 Transmembrane 1146 -1162 (1146 -1162) 

Final Results 

bacterial membrane Certainty=0. 5034 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

imoniae] 

, Gaps = 88/1311 (6%) 

MKRKDLFGDKQTQYTIRKLSVGVASVATGVCI FLHSPQVFAEEVSASPANTAIAESNINQ 6 0 
M++ +K+ Y+IR L G SV G + h A+A 1 + 

MRKTPSHTEKKMVYSIRSLKNGTGSVLIGASLVIi LAMATPTISS 44 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


45 


Query: 


120 


Sbjct: 


98 


Query: 


180 


Sbjct: 


158 


Query: 


240 


Sbjct: 


217 




300 


Sbjct: 


277 




359 


Sbjct: 


337 




419 


Sbjct: 


397 




479 


Sbjct: 


457 


Query: 


537 


Sbjct: 


517 




595 



- -AAGSGKN3SDISSPGNAWASLEKTEEKP 97 



VESTGLWIWGDVDQPSSIWPNGAIPMTDAKKDDYGTmiFKLSEKQRKQISFLINNKAGT 239 

++ GLW W DV++PS NWPNGA+ DAKKDDYGYY+D KL +Q K+ISFLINN AG 
-DAQGLWTWDDWKPSEWIPNGALSFKDAKKDDYGTYLDVKLKGEQAKKISFLINNTAGK 216 



L P+MN+ W+D+ Y +Y+P G VR+NY 



PS+ WPDG++F G YGRYID+ L A+E GFL+LDESK GD VK++ +Y F DL 



N + I+D+ +D + + GDF+ + + +SYN + T+ SW KD+ Y+Y G 



LGA L ++G +V+ +LWSPSAD V++++YDK++ ++W T L K +G W+ LD+ 



KLGI ++TGYYY Y+I+R V LDP YAKS LA VI+S+ ++D K AKAAFV+P++I 



Query: 595 GPQNLSFAKIANFKGRQDAVIYEAHVRDFTSDRSLDGKLKNQFGTFAAFSEKLDYLQKLG 654 
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Query: 655 WHIQLLPVLSYFYVNEmKSRSTA-yTSSDNNYNVJGYDPQSYFALSGMYSEKPKDPSAR 713 

VTHIQLLPVLSY++VNE+ + Y SS++NYMW6YDPQ+YF+L+GMYS PK+P R 

Sbjct: 637 VTHIQLLPVLSYYFVNELKNHEHLSDYASSNSNY^IVJGYDPQNYFSLTGMYSSDPKNPEKR 696 

Query: 714 IAELKQLIHDIHIQlGMGVILDWYNHTAKTYIiFEDIEPNYYHFMNEDGSPRESFGGGRLG 773 

IAE K LI++IHKRGMG ILDWYNHTAK +FED+EPNYYHFM+ DG+PR SFGGGRLG 
Sbjct: 697 IAEFKKIiINEIHKRGMGAILDWYWHTAKVDIFEDLEPNYYHFMDADGTPRTSFGGGRLG 756 

Query: 774 TTHAMSRRVLVDSIKYLTSEFKOTGFRFDMMGDHDAAAIELAYKEAKAINPNMIMIGEGW 833 



Query: 834 RTFQGDQGQPVKPADQDWMKSTDTVGVFSDDIRNSLKSGFPNEGTPAFITGGPQSLQGIF 893 

RT+ GD+ P K ADQDWMK TDTV VFSDDIRN+LKSG+PNEG PAFITGG + + IF 
Sbjct: 817 RTYAGDE1MPTKAADQDWMKHTDTVAVFSDDIRNNLKSGYPNEGQPAFITGGKRDVNTIF 876 

Query: 894 KNIKAQPGNFEADSPGDWQYIAAHDNLTLHDVIAKSINKDPKVAEE- -EIHRRLRLGOT 951 

KM+ AQP NFEADS PGDV+QYIAAHDNLTIi D+IA+SI KDP AE EIHRRLRLGN+ 
Sbjct: 877 KNL1AQPTNFEADSPGDVIQYIAAHDNLTLFDIIAQSIKKDPSKAENYAEIHRRLRLGNL 936 

Query: 952 

Sbjct: 937 

Query: 1008 SYDSSDAINHFDWAAATDNNKHPISTKTQAYTAGLITLRRSTDAFRKLSKAEIDREVSLI 1067 

SYDSSDA+N FDW ATD +P + K++ Y GLI LR+STDAFR S +1 V LI 
Sbjct: 997 SYDSSDAVNKFDWTKATDGKAYPENVKSRDYMKGLIALRQSTDAFRLKSLQDIKDRVHLI 1056 

Query: 1068 TEVGQGDIKEKDLVIAYQTIDSKGDIYAVFWADSKARNVLLGEKYKHLLKGQVIVnADQ 1127 

T GQ ++++D+VI YQ GDIYAVFVNAD KAR LG + HL +V+ D +Q 
Sbjct: 1057 TVPGQNGVEKEDWIGYQITAPNGDIYAVFVNADEKAREFNLGTAFAHLRI^IAEVIjADENQ 1116 

Query: 1128 AGIKPISTPRGVHFEKDSLLIDPLTAIVIKVGKVAPS PKEELQAD 1172 

AG 1+ P+G+ + + L ++ LTA V++V + S P+ + +A 

Sbjct: 1117 AGSVGIAWPKGLEWTEKGLIOiimLTATVLRVSQNGTSHESTAEEKPDSTPSKPEHQNEAS 1176 

Query: 1173 YPKTQ SFKESKTVEKVNRIANKT ---SITPWSKKADS 1207 

+P Q + ++K + N+ + T S+ V K++ 

Sbjct: 1177 HPAHQDPAPEARPDSTKPDAKVADAENKPSQATADSQAEQPAQEAQASSVKEAVRKESVE 1236 

Query: 1208 YLTNE ANLPKTGDKSSKILSWGISILASLIALVGLSLKRNR 1249 

+ E A LP TG K+ L GIS+LA LL L G LK + 

Sbjct: 1237 NSSKENISATPDRQAELPNTGIKNENKLLFAGISLLA-LLGL-GFLLKNKK 1285 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2599> which encodes the amino acid 
sequence <SEQ ID 2600>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 1153 -1169 (1148 -1171) 
INTEGRAL Likelihood = -1.97 Transmembrane 29 - 45 ( 2B - 46) 



- Final Results 

bacterial membrane Certainty=0 . 5331 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < ; 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < < 



A related sequence was also identified in GAS <SEQ ID 9125> which encodes the amino acid sequence 
<SEQ ID 9126>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial membrane Certainty^ 0.S33 (Affirmative) < suco 

bacterial outside — Certainty^ 0.000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty= 0.000 (Wot Clear) < suco 

LPXTG motif: 1133-1137 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 715/1097 (65%) , Positives = 872/1097 (79%) , Gaps = 21/1097 (1%) 



Y+D L+ K R+Q+£ 



1LD+SKTGDA+KVQP DY+F++L NH Q+FVKD DPKVYNNPYYIDQV LK A+Q 



Query: 


156 


Sbjct: 


95 


Query: 


216 


Sbjct: 


154 


Query: 


276 


Sbjct: 


214 


Query: 


336 


Sbjct: 


274 


Query: 


396 


Sbjct: 


334 


Query: 


456 


Sbjct: 


394 


Query: 


516 


Sbjct: 


453 


Query: 


574 


Sbjct: 


513 


Query: 


634 


Sbjct: 


573 


Query: 


694 


Sbjct: 


633 


Query: 


754 


Sbjct: 


693 




814 


Sbjct: 


753 


Query: 


874 


Sbjct: 


813 


Query: 


934 


Sbjct: 


873 



I+A FTTLDG+B+ 



4IIYDKDNQNR 515 
h RQSW+ KD+LYAY G LGA L +DGS V+ +LWSPSAD+V +++YDK +Q R 
/ARQSWQLKDKLYAYDGEI^TLAKDGS-vDLALWSPSADTVKVVVYDKQDQTR 452 



L K++KGVW+ L D+ GI +YTGYYYLYEI RG++KV +LDPYAKSLA W+ 



T DDIKTAKAAF++PS+LGP h FAKI NFK R+DA+IYEAHVRDFTSD++L+GKL 



YHFMN DG+ RESFGGGRLGTTHAMSRR+LVDSI YLT EFKVDGFRFDMMGDHDAAAIE 



A+K AKAINPN IMIGEGWRT+QGD+G+ ADQDWMK+T+TVGVFSDDIRN+LKSGF 



PNEGT AFITGG ++L+G+FK IKAQPGNFEAD+EGDWQY1AAHDNLTLHDVIAKSINK 



DPKVAEEEIH+R+RLGN MILT4QGTAFIHSGQEYGRTK-LLNPDY TK SDDK+PNKAT 



Query: 994 LIEAVECEYPYFIHDSYDSSDAINHFDWAAATDNNKHPISTlCrQAYTAGMTLRRSTDAFR 1053 
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LI+AV +YPYFIHDSYDSSDA+NHFDWA ATD+ HPIS +T+AYT GLI LRRSTDAF 
Sbjct: 933 LIDAVAQYPYFIHDSYDSSDAVMHPDWAKATDSIAHPrSNQTKAYTQGLIALRRSTDAPT 992 

Query: 1054 KLSKAEIDREVSLITEVGQGDIKEKDLVmYQTIDSKGDIYAVFVNADSKARNVLLGEKY 1113 

K +KAE+DR+V+LIT+ GQ I+++DL++ YQT+ S GD YAVFVNAD+K R V+L + Y 
Sbjct: 993 KATKAEVDRDVTLITQAGQDGIQQEDLIMGYQTVASNGDRYAVFTOADNKTRKVVLPQAY 1052 

Query: 1114 KHLLKGQVIVDADQAGIKPISTPRGVHFEKDSLLIDPLTAIVIKV-GKVAPSPKEELQAD 1172 

++LL QV+VDA+QAG+ 1+ P+GV F K+ L 1+ LTA+V+KV K A ' +++ Q D 
Sbjct: 1053 RYLLGAQVLVDAEQAGVTAI AKPKGVQFTKEGLT I EGLTALVLKVS SKTANPSQQKSQTD 1112 

Query: 1173 YPKTQSFKESKTVEKVNRIANKTSITPWSKKADSYLTNEANLPKTGDKSSKILSWGIS 1232 

+T++ SK ++K K + T LPKTG+ SSK L GI+ 

Sbjct: 1113 NHQTKTPDGSKDLiDKSLMTRPKRAKT NQKLPKTGEASSKGLLAAGIA 1159 

Query: 1233 ILASLLALVGLSLKRNR 1249 

+ LL + L +KR + 
Sbjct: 11S0 L LLLAISLLMKRQK 1173 

A related GBS gene <SEQ ID 8673> and protein <SEQ ID 8674> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 

McG: Discrim Score: -0.88 

GvH: Signal Score (-7.5): 4.13 
Possible site: 41 

>» Seems to have no N-terminal signal sequence 

ALOM program count: 3 value: -10.08 threshold: 0.0 

INTEGRAL Likelihood =-10.08 Transmembrane 1225 -1241 (1222 -1247) 
INTEGRAL Likelihood = -2.44 Transmembrane 19 - 35 ( 18 - 36) 
INTEGRAL Likelihood = -0.11 Transmembrane 1146 -1162 (1146 -1162) 

PERIPHERAL Likelihood = 2.44 653 
modified ALOM score: 2.52 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5034 (Affirmative) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

LPXTG motif: 1081-1085 

The protein has homology with the following sequences in the databases: 

ORF00953(1111 - 3768 of 4356) 

EGAD 1 165156 |TM1845 (18 - 840 of 843) pullulanase {Thermotoga maritima}SP |033840 ] PDLA_THEMA 
PULLOLANASE PRECURSOR (EC 3.2.1.41) (ALPHA 

-DEXTRIN ENDO- 1,6 -ALPHA- GLTJC0SIDASE) (PULLOLAN 6- 

GLUCANOHYDROLASE) .GP| 2815006 | emb | CAA04522 . 1 | |AJ001087 pullulanase {Thermotoga mari 

tima}GP| 4982428 |gb | AAD36907 . 1 |AE001821_7 |AE001821 pullulanase {Thermotoga 

maritima}PIR|H72204|H72204 pullulanase - Thermotoga mariti 

ma (strain MSB 8) 

%Match =8.4 

%Identity =30.6 %Similarity =52.8 

Matches = 210 Mismatches = 298 Conservative Sub.s = 152 

1032 1062 1092 1122 1152 1182 1212 1242 

NKAGTNLSGDHHIPLLRPEMNQWIDEKYGTHTYQPLKEGYVRINYLSSSSNYDHLSAWLFKDVATPSTTWPDGSNFVNQ 
I : : | : ::| || : |:: | : | : 

MKTKLWLLLVLLLSALIFSETTIVTOYHRYDGKYDGVJNLWIWP--VEPVSQEGKAYQFTGE 
10 20 30 40 50 

1272 1302 1329 1359 1668 1698 

GLYGRYIDVSLKTNAKEIGFLI -LDESKTGDAVKVQPNDYVFRDLA PKQGHFNISYNGNNVMTRQSWEFKDQL- - - 

:|: | | : ::| :: |=] |= || 

DDFGKVAWKLPMDLTKVGI IVRLNE — WQAKDVAKDR 
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1746 1776 1806 1836 

YAYSGNLGAVIjNQDGSKVEASLWSPSADSVTMI iydkdn 

I I I llll i « I =111 : I ::== 
FIEIKDGKAEVWILQGV .ELIIEGYKPARVIKKEILDDYYYD3ELQRVYSP3--KTIFRVWSPVSKMVKVIiLFKNaB 



1866 1896 1926 1956 1986 2016 2046 2076 

10 QlSniWATTPLMKNNKGOTQTILDTKLGIKNYT^ 

|||: ::: | | :|||::: : ||),|, : : |:| || : 

DTEPYQWNMEYKGWGVWEAWEGDL DGVFYLYQLENYGKIRTTVDPYSKAVYA NSKKSAWNLA 

270 280 290 300 310 320 

15 2106 2136 2166 2196 2226 2253 2283 

QLGPQNLSFAK1AHFKGRQDAVIYEAHVRDPTSDRSLDGKLKNQFGTFAAFSEK LDYLQKLGVTHIQLL 

: h s :| :||:||| I : I I = = I I = I = : : I = I :| 

RTNPEGWENDRGPKIEGYEDAI IYEIHIADITG--LENSGVKNK-GLYLGLTEENTKGPGGVTTGLSHLVELGVTHVHIL 
330 340 350 360 370 380 390 



2313 2343 2373 2403 2433 2463 2493 

PVLSYFYVNEMDKSRSTAYTSSDOTTYNWGYDPQSYFALSGMYSEKPKDPSARIAELKQLIHDIHKRGMGVILDVVYl^ 
I : --HI = lllllll = I II Ihl II I I = : = =11 I : I I I = I : I = II 

PFFDFYTGDELDK DFEKYYNWGYDPYLFMVPEGRYSTDPKNPHTRIREVKEMVKRLHKHGIGVIMDMVFPHTY 

25 410 420 430 440 450 460 470 

2544 2574 2601 2631 2661 2691 2721 2751 

- - AKTYLFED I E PNYYHFMNEDGS P - RESFGGGRLGTTHAMSRRVLVDS I KYLTSEFKVDGFRFDMMGDHDAAAI ELAYK 
: |: I |:: ::: |: || | : : | |: :||:: | |: :|||||| || | : : 
30 GIGELSAFDQTVPYYFYRIDKTGiAYliNESGCGNVIASERPMRKFIVTlTVTYVTOOiYIlIDGFRFDQMGLIDKKTMLEVER 
430 490 500 510 520 530 540 550 



2781 2811 2841 2871 2901 2931 2979 

EAKAINPNMIMIGEGWRTFQGDQGQPVKPADQDWKKSTDTVGVFSDDIRNSLKSGFPNEGTPAFITGG PQSLQGIF 

1=1 :|= II I I I l== 1=1 1=1= I:::: I 1= II = =1= 

ALHKIDPTIILYGEPW GGWGAPIRFGKSD- - VAGTHVAAFNDEFRDAIRGSVFKPSVKGFVMGGYGKETKIKRGW 

560 570 580 590 500 610 620 



3030 3060 3084 3114 3144 3174 3204 

40 KNIKAQPG- - -NFEADSPGDWQYIAAHDNLTLHD- - VIAKSINKDPKVAEEEIHRRLRLGNVMILTSQGTAFIHSGQEY 

=1 =1 I I = = I I III II I =1 =1 = 111= =1 ==11111 1=1 ll== 

GS INYDGKLI KS FAIiD - PEETIOTAACHDITOTLWDKI^IAAKADKKKEWTEEELKNAQKLaGAI LLTSQGVPFLHGGQDF 
640 650 660 670 680 690 700 

45 3234 3264 3294 3324 3354 3384 3414 3444 

GRTKRLmPDYMTKVSDDKLPNKATLIEAVKEYPYFIHDSYDSSDAINHFDWAAATDNMKHPISTKTQAYTAGLITljRRS 
III I =ll== =11 11= « I III 11= 

CRTKN FNDNSYNAPISINGFDY ERKLQFIDVFNYHKGLIKLRKE 

710 720 730 740 

50 

3474 3504 3534 3564 3594 3624 3654 

TDAFRKLSKAEI DREVSLITEVGOGDIKEKDLVIAYQTIDSKGDIYAVFVNADSKARNVLLGEKYKHLLK 

III = II I {■■:■■ = I 11 = 1= I I III = 

HPAFRLKNAEEIKKHLEFLPGGRRIVAFMLKDHAGGDPWKDIWIYN GNLEKTTYK- LPE 

55 760 770 780 790 800 



3678 3708 3738 3768 3798 3828 3858 3888 

GQ--VIVDADQAGIKPISTPRGVHFEKDSLLIDPLTAIVIKVGKVAPSPKEELQADYPKTQSFKESKTVEKVKRIANKTS 
I: =11 =111 = = =111=1 1= 

60 GKWNWVNSQKAGTEVIETVEG TIELDPLSAYVLYRE 

820 830 840 

SEQ ID 2598 (GBS5) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 7; MW 134kDa). 



65 The His-fusion protein was purified as shown in Figure 190, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 860 

A DNA sequence (GBSx0912) was identified in S.agalactiae <SEQ ID 2601> which encodes the a 
acid sequence <SEQ ID 2602>. Analysis of this protein sequence reveals the following: 



Possible site: 26 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =■ 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



231 - 247 ( ; 

50 - 66 ( 

23 - 39 ( 

173 - 189 ( ] 

299 - 315 ( I 

Transmembrane 115 - 131 ( j 

Transmembrane 80 - 96 ( 

Transmembrane 97 - 113 ( 



• Final Results 

bacterial membrane Certainty=0 . 5288 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < ; 



A related GBS nucleic acid sequence <SEQ ID 8675> which encodes amino acid sequence <SEQ ID 8676> 
was also identified. Analysis of this protein sequence reveals the following: 



McG: Length of UR: 19 

Peak Value of UR: 3.08 
Net Charge of CR: 1 
McG: Discrim Score: 9.76 
GvH: Signal Score (-7.5): -4.57 

Possible site: 22 
»> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 7 value: -10.72 threshold: C 
Likelihood =-10 
Likelihood = -8 
Likelihood = -6 
Likelihood = -5 
Likelihood = -4 
Likelihood = -3 
Likelihood = -0. 
PERIPHERAL Likelihood = 0.26 
modified ALOM score: 2.64 
icml HYPID: 7 CFP: 0.529 



' Reasoning Step: 3 

— Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
136 



217 - 233 ( 214 - 23: 



52 ( 
25 ( 
■ 175 ( 



154 - 182) 



-- Certainty=0.5288(Affi3 

— Certainty=0 . 0000 (Not Clear) ■ 

— Certainty=0. 0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB08178 GB:AB036768 exfoliative toxin A [Staphylococcus hyicus] 
Identities = 134/298 (44%) , Positives = 197/298 (65%) 

Query: 22 PL VMAGLVLGLLALGNLLEGYGTYVRYCLGLVALVFWI FL I KGILKNKKESRKELSNPLI 81 

PLV +GLVLGLL LGNLL+ + G++A++ W+ L+ + N + +L++PL+ 

Sbjct: 7 PLVSSGLVLGLLGLGNLLKDVSLSIJJALCGIIAILvWiHL^ 66 

Query: 82 ASVFTTFFMAGMILSTYILLFRSIiGIWVAVLSEGviWLSFIALIIHMA.IFSWKYLRHFSM 141 
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+SVFTTFFM+G + +TY+ F S ++ L +W L I ++ HM IFS KYL4- FS+ 
Sbjct: 67 SSVFTTFFMSGFLGTTYLOTFFSHISFIHHLITPLWLLCLIGILTHMIIFSHKYLKSFSL 126 

Query: 142 ANLFPSWSVLWGIGVASLTAPISGQFTIGKIOTWYGFIATLVLLPFLFIKAYKIGLPSA 201 

N++PSW+VLY+GI +A LTAP+SG F IGK+ YGF+AT ++LP +F + L ++ 

Sbjct: 127 ENVYPSWTVLY1GIAIAGLTAPVSGYFFIGKLTVIYGFVATCIVLPLVFKRLKTYPLQTS 186 

Query: 202 VKPNITTICAPMSLITAGYVNSFVSPNRGLLLLLIVMAQFLYFFILFQVPKLLIGDFTPG 261 

+KPN +TICAP SL+ A YV +F + +++L ++++Q YF+I+FQ+PKLL F+P 
Sbjct: 187 IKPNTSTICAPFSIjVAAAYVLAFPEAHDFWILFLILSQVFYFYIVFQLPKLLREPFSPV 246 

Query: 262 FSAFTFPLVISATSLKLSIQHLSLPVDIQGLVBFEIGTTTLIVMIVMVRYIFFLRRTI 319 

FSAFTFPLVISAT+LK S+ L P GL+ FE T+IV V YI + + 
Sbjct: 247 FSAFTFPLVISATALKNSMPILIFPEIWNGLLMFETVLATVIVFRVFFGYIHLFLKPV 304 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2603> which encodes the amino acid 
sequence <SEQ ID 2604>. Analysis of this protein sequence reveals the following: 



i have no N- terminal 
Likelihood = -9. 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -5. 
INTEGRAL Likelihood = -3. 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -1. 

Final Results 



bacterial cytoplas 



• Certainty=0. 4927 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < s 

• Certainty=0. 0000 (Not Clear) < s 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 138/305 (45%), Positives = 200/305 (65%), Gaps = 5/305 (1%) 

Query: 12 RYMMKNWEKPPLVMAGLVUGLIALGl^LEGYGTYTOYCLGLVALVFWIFLIKGILKNKKE 71 

R +MK+ + PPLVM+GL LG L+ GNLL Y + Y L AL + L+ G+++N + 
Sbjct: 12 RTLMKHLKTPPLVMSGMLGTLSFGNLLATYVSIF1WLGILAALFIYGILLVGMVRNLND 71 

Query: 72 SRKELSNPLIASVFTTFFMAGMILSTYILLFRSLGIWAVLSKGVWWLSFIALIIHMAIF 131 

++ +L PLIASVF TFFM GM+LS+ h G W+ L+ WWL F+ ++ +A + 
Sbjct: 72 TKMQLRQPLIASVFPTFFMTGMLLSSLFLKVTG-GCWLGFLT WWLFFLGNLVLIAYY 127 

Query: 132 SWKYLRHFSMANLFPSWSVLYVGIGVASLTAPISGQFTIGKIVFWYGFIATLVLLPFLFI 191 

++++ FS N+FPSWSVL+VGI +A+LTAP S QF +G+++FW 4 T V+LPF+ 
Sbjct: 128 QYRFVFSFSWDNVFPSWSVLFVGIAMAALTAPASRQFLLGQVIFWVCLLLTAVILPFMAK 187 

Query: 192 KAYKIGLPSAVKPNITTICAPMSLITAGYVNSFVSPKIRGLLLLLIVMAQFLYFFILFQVP 251 

K Y IGL AV PNI+T CAP+SL++A Y+ +F P G+++ L+V +Q LY F++ Q+P 
Sbjct: 188 KTYGIGLGQAVMPNISTFCAPLSLLSASYLATFPRPQVGMVIFLLVSSQLLYAFVWQLP 247 

Query: 252 KLLIGDFTPGFSAFTFPLV1SATSLKLSIQHLSLP-VDIQGLVHFEIGTTTLIVMIVMVR 310 

+LL F PGFSAFTFP VISATSLK+4-+ L + Q L+ E+ T +V V 
Sbjct: 248 RLLNRPMPGFSAFTFPFVISATSLKMTLSFLGWC^LGWQVLLLGEVLLATALVTYVYGA 307 

Query: 311 YIFFL 315 
Y+ FL 

Sbjct: 308 YLRFL 312 



60 Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
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Example 861 

A DNA sequence (GBSx0913) was identified in S.agalactiae <SEQ ID 2605> which encodes the amino 

acid sequence <SEQ ID 2606>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
5 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2607> which encodes the amino acid 
sequence <SEQ ID 2608>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

20 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 45/57 (78%), Positives = 53/57 (92%) 

25 

Query: 1 MVKKFAFAKGIATGWATAATIAGAAFAIKKTIIEPEEEKIAFIEENRKKAARKRVS 57 

MVKK+ F KG+ATGV+ATAAT+AGA FA+KKTI I +PEEEK AFIEENRKKAAR+RV+ 
Sbjct: 1 WKKYQFWGIATGVIATAATVAGAVFAVKKTIIDPEEEKAAFIEENRKKAARRRVA 57 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 862 

A DNA sequence (GBSx0914) was identified in S.agalactiae <SEQ ID 2609> which encodes the amino 
acid sequence <SEQ ID 2610>. This protein is predicted to be fRNA isopentenylpyrophosphate transferase 
35 (miaA). Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

40 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=o . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9897> which encodes amino acid sequence <SEQ ID 9898> 
45 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06085 GB-.AP001515 tRNA isopentenylpyrophosphate transferase 
[Bacillus halodurans] 
Identities = 139/311 (44%) , Positives » 200/311 (63%) , Gaps = 21/311 (6%) 

50 

Query: 7 KIKLIAWGPTAVGKTAMIELAKTFNGEIISGDSQQVYQKLDIGTAKASKEEQEQAYHH 66 

K KL+A+VGPTAVGKT + IAK NGE+ISGDS QVY+ +DIGTAK + EE + HH 
Sbjct: 2 KEKLVAIVGPTAVGKTKTSVMIiAKRLNGEVISGDSMQVYRGMDIGTAKITAEEMDGVPHH 61 
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Query: 67 LIDTOEVNENYSVYDFVKEAKVA-DTIISKC-KIPIIVGGTGLYLQSLFEGTHLGGEVNQE 126 

LID+++ +E++SV DF A I I +G++P +VGGTGLY+ ++ ++LG E 
Sbjct: 62 LIDIKDPSESFSVADFQDLATPLITEIHERGRLPFliVGGTGLYVNAVIHQFNLGDIRADE 121 

5 

Query: 127 TLMAYREKLE SLSDEDLFEKLT EQSIIIPQVWRRRAIRALELAKF 171 

YR +LE S + L +KL+ + + I N RR IRALE+ K 
Sbjct: 122 D---YRHELEAFVHSYGVQM.HDKLSKIDPKAAAA:HPNNYRRVIRALEIIKLTGKTVTE 178 

10 Query: 172 -GITOLQNSESPYDVLLIGIOTDRQVLYDRDB5RVDLMMDNGLLDEAKWLYD-NYPSVQAS 229 

+ + SPY++++IGL +R VLYDRINRRVD M++ GL+DEAK LYD Q+ 
Sbjct: 179 QARHEEETPSPYNLVMIGLTMERDVLYDRINRRVDQMVEEGLIDEAKKLYDRGIRDCQSV 238 

Query: 230 KGIGYKELFPYFSKQIPLEEAVDKLKQiSrrRRFAOQLTWFRNPJ^JVEFIIWGEEKrYQQKI 289 
15 + IGYKE++ Y + LEEA+D LK+N+RR+AKRQLTWPRN+ NV + + + ++ +KI 

Sbjct: 239 QAIGYKEmDYLDGNVTLEEAIDTLI<RNSRRYAICRQLTWFRNKANVTWFDMTDVDFDKKI 298 



A related DNA sequence was identified in S.pyogenes <SEQ ID 261 1> which encodes the amino acid 
sequence <SEQ ID 2612>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 202/296 (68%), Positives = 250/296 (84%) 

Query: 5 MRKIKLIAWGPTAVGKTAICIELAKTFNGEIISGDSQQVYQKLDIGTAKASKEEQEQAY 64 

M KIK++ +VGPTAVGKTALGI LAK FNGEIISGDSQQVY++LDIGTAKA++EEQE A 
Sbjct: 1 MTKIKIWIVGPTAVGKTALGISIAKAFNGEIISGDSQQVYRQLDIGTAKATQEEQEAAV 60 

Query: 65 HHLinVREVJffiNYSVYDWKEAKVAIDTIISKGKIPIIVGGTGLYLQSLFEGYHLGGEVK 124 

HHLID+REV E+YS YDFV++A+ +1 I+S+GK+PIIVGGTGLYLQSL EGYHLGG+V+ 
Sbjct: 61 HHL1DIREVTESYSAYDFVQDAQKSISDIVSRGKLPIIVGGTGLYLQSLLEGYHLGGQVD 120 

Query: 125 QETLMAYREKLESLSDEDLFEKLTEQSIIIPQVNRRRAIRALELAKFGNDLQKISESPYDV 184 

QE + AYR +LE L D DL+E+L +1 I QVNRRRAIRALELA+F ++L+N+E+ Y+ 
Sbjct: 121 QEAVKAYRNELEQLDDHDLYERLQVNNITIEQVNRRRAIRALEIiAQFADELENAETAYEP 1B0 

Query: 185 LLIGIjNDDRQVLYDRINFJIYDLMMDNGLLDEAro&YDNYPSVQASKGIGYKELFPYFSKQ 244 

L+IGENDDRQV+YDRIN+RV+ M++NGLL+EAKMLY++YP+VQAS+GIGYKELFPYF + 
Sbjct: 181 LIIGIJSnDDRQVIYDRINQRVNRMIENGLLEEAKWIjYEHYPTVQASRGIGYKELFPYFVGE 240 

Query: 245 IPLF^VDKLKQlTORRFAKRQLTWFRNRm^FItWGEEOTQQKIKRKVSDPLSSK 300 

+ L EA D+LKQNTRRFAKRQLTWFRNRM V F + +Y Q + +V DFL K 
Sbjct: 241 MTLAFASDQLKQOTRRFAKRQLTWFRNRMAVSFTAITAPDYPQVVHDRVRDFLGQK 296 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 863 

A DNA sequence (GBSx09l5) was identified in S.agalactiae <SEQ ID 2613> which encodes the amino 
acid sequence <SEQ ID 2614>. This protein is predicted to be hflX (hflX). Analysis of this protein 
sequence reveals the following: 
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Possible site: 35 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06081 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 182/406 (44%) , Positives = 254/406 (61%) , Gaps = 12/406 (2%) 



+E ED V+VN L+ Q NL LGV+VIDR QLILDIFA RA+S EGKLQV 



LAQL Y+LPR+VGQG LSR GGIG4RGPGE++LE +RR IR +++DI++QLK K+R 



Query: 


9 


Sbjct: 


10 


Query: 


67 


Sbj Ct: 


70 




127 


Sbjct: 


130 


Query: 


187 


Sbj ct : 


190 




247 


Sbjct: 


250 




307 


Sbjct: 


310 




362 


Sbjct: 


367 



f TF+I L+GYTNAGKST++N LT YE + LFATLD T+++ I 



+V L+DTVGFI LPT LVAAF+STLEE +H DLL HV+D S 



A related DNA sequence was identified in S. pyogenes <SEQ ID 2615> which encodes the z 
sequence <SEQ ID 261 6>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>=> Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB06081 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 185/403 (45%), Positives = 246/403 (60%), Gaps » 6/403 (1%) 

Query: 13 ERVILLGVEL - -QTTEHFDMSMTELANIAKTAGVKVMASFSQKRERYDSKTFIGSGKLDE 70 

ERV L+ +L T E F+ S+ EL L TA V+ +QKRE + T+IG GKLDE 
Sbjct: 10 ERVFLVACQLPNIWDEQFEASLEELEALTLTAQGTVIDRLTQKREAIEPATYIGRGKLDE 69 

Query: 71 IKAIVEADEIDAVIVNmLTARQNANLEA 130 

+ +E ED VIVN L+ Q NL L V+VIDR QLILDIFA RA+S EGKLQV 
Sbjct: 70 1^IK3^EEQEADLVIWGELSGSQVRNLT?JRLGVRVIDRTQLILDIFAGRAKSREGKLQVE 129 



Query: 131 LAQLKYMLPRLVGQGIMLSRQAGGIGSRGFGESQLELNRRSIRHQIADIERQLTQVEKNR 190 
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Sbjct: 


130 


Query: 


191 


Sbjct: 


190 


Query: 


251 


Sbj ct : 


250 


Query: 


311 


Sbj ct: 


310 




369 


Sbjct: 


370 



-950- 

LAQL Y+LPR+VGQG LSR GGIG+RGPGE++LE +RR IR ++ADI++QL K 
LAQLNYLLPRIVGQGQGLSRLGGGIGTRGPGETKIjETDRRHIRKRMADIDKQLKHTVB 

QTIRDRRVGSDTFKIGLIGYTNAGKSTIMNLLTDDSHyEMJEIjPATLDATTKQLYLE^ 
R RR + TF+I L+GYTNAGKST++N LT YE + LFATLD T+++ L 4 
DRYRARRERNQTFR1ALVGYTNAGKSTLLNRLTASDSYEEDLDFATLDPMTRKMRLP£ 

QATLTDTVGFIQDLPTELVAAFKSTLEESKYVDLLLHVIDASDPNHSEQEKWLNLLB 
+ L+DTVGFI LPT LVAAF+STLEE K+ DLLLHV+D S + V LL 



KL h R ++ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 326/412 (79%) , Positives = 375/412 (90%) 

Query: 1 MIETKEEQERVILVGWLQDTEMFEMSMEEIASIAKTAGANVVimYYQKRDKYDSKSFIG 60 

MIETK +QERVIL+QVELQ TE+F+MSM ELA+LAKTAG V+ + QKR++YDSK+FIG 
Sbjct: 5 MIETKRQQERVILLGVELQTTEHFDMSMTEIANIAKTAGVKUMASFSQKRERYDSKTFIG 64 

Query: 61 SGKLEEIKAIVFJ^DEIDTVVVNNRLTPRQNSNLEAELGVKVIDRMQLILDIFAMRARSHE 120 

SGKL+EIKAIVEADEID V+VNNRLT RQN+NLEA L VKVIDRMQLILDIFAMRARSHE 
Sbjct: 65 SGKrinKIKAIVFJffiEIDAVIVMNRLTARQNANLEAVLEVKVIDRMQLILDIFAMRARSHE 124 

Query: 121 GKLQVHLAQLKYMLPRLVGQGIMLSRQAGGIGSRGPGESQLELNRRSIRHQISDIERQLK 130 

GKLQVHLAQLKYMLPRLVGQGIMLSRQAGGIGSRGPGESQLEIJSRRSIRHQI+DIERQL 
Sbjct: 125 GKLQVHIAQLKYMLPRLVGQGIMLSRQAGGIGSRGPGEEQLELNRRSIRHQIADIERQLT 184 

Query: 181 IVEKNRETTOERRTOSTTFKIGLIGYTNAGKSTIMNVLTDDKQYEANELFATLDATTKQI 240 

VEKKR+T+R+RRV S TFKIGL1GYTNAGKSTIMN+LTDD YEANELFATLDATTKQ+ 
Sbjct: 185 QVEK1SKQTIRDRRVGSDTFKIGLIGYTNAGKSTIMNLLTDDSHYEANELFATLDATTKQL 244 

Query: 241 YLQNQFQVTLTDTVGFIQDLPTELVAAFKSTLEESRHVDLLFHVIDASDPNHEEHEKVVM 300 

YL+NQFQ TLTDTVGFIQDLPTELVAAFKSTLEES++VDLL HVIDASDPNH E EKW+ 
Sbjct: 245 YLENQFQATLTDTVGFIQDLPTELVAAFKSTLEESKYVDLLLHVIDASDPNHSEQEKWL 3 04 

Query: 301 EILKDLDMIDIPRIA1YNKMDVTEQLNATTFPNVRIABKKQGSKDLLRRLIVDEIRHIFD 360 

+LK+LDM++IPRLAIYNK+D+ EQ AT FPN+RI+A+ + SK LLRRLI+D+IR F 
Sbjct: 305 NLLKELDMLNIPRLAIYNKVDIAEQFTATAFPNIRISARSKDSKILliRRLIIDQIRDQFV 364 

Query: 361 EFSIRVHQNQAYKLYDLNKIALLDTYTFEEEYENITGYISPKQKWKLEEFYD 412 

F I +VHQ++AYKLYDLN+ +ALLD YTF++E E+I+GYISPKQ+W+L++FY+ 
Sbjct: 365 PFRIKVHQDKAYKLYDLNRVALLDHYTFDQEIEDISGYISPKQQWRLDDFYE 416 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 864 

A DNA sequence (GBSx0916) was identified in S.agalactiae <SEQ ID 2617> which encodes the amino 
acid sequence <SEQ ID 2618>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm -— Certainty=0. 2044 (Affirmative) • 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < : 

bacterial outside --- Certainty=0. 0000 (Not Clear) < : 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2619> which encodes the amino acid 
sequence <SEQ ID 2620>. Analysis of this protein sequence reveals the following: 

5 Possible site: 40 

>i> Seems to have no N-termiml signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3436 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/209 (59%) , Positives = 150/209 (71%) 

Query: 1 MIDYIDLALTYGGFTSLDKVYLEKKLDGLSKCQRLDFITPPPSVINAYFAEIYQKQGPEA 60 

M 4-YIDLA TYGGFTSLD YL L L+ QQ+L FITPPPSVINAYFAEIYQKQ P+A 
Sbjct: 5 tOINYIDLAKTYG^FTSLDTNYnfflLlASLTDQQKIiAFITPPPSVINAYFAEIYQKQSPQA 64 

Query: 61 ATDYYFDLSKALGLFPKHLSFDEEKPFIRIiNLSGKSFGFAYLNDQEEASVFSEVKEVITP 120 

ATDYYF+LSKALGLF S F+EEKPF+RLNLSGK+ +GFAY NDQE A VFSE E P 
Sbjct: 65 ATDYYFra^SKALGLFTDQPSFEEEKPFVRLNLSGKAYGFAYQNDQEVALVFSEKAEPKKP 124 

Query: 121 QLIiIjEIAQIFPQYKVYRDRSGIRMAKIDFDETESQNITPETSLLGNVLQLKKDIIKITSF 180 

4-L E+ QIFPQY VY D+ ++M F++ E ++ITP+ +LL + +L I + F 
Sbjct: 125 ELFFELTQIFPQYbWYEDKGQLKMQAKQFEQGECEDITPDDTLLSKIYRIANGITMLKGF 1B4 

Query: 181 NQEELLELVKTKSGKYYYSSQGRESVTYI 209 

N EEL L +T SG+ YY RE +IYI 
Sbjct: 185 NVEELWALSQTFSGQKYYDFAQREFMIYI 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 865 

A DNA sequence (GBSx0917) was identified in S.agalactiae <SEQ ID 2621> which encodes the amino 
acid sequence <SEQ ID 2622>. Analysis of this protein sequence reveals the following: 
Possible site: 16 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1060 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9895> which encodes amino acid sequence <SEQ ID 9896> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

?GP:CAB14316 GB.-Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 156/309 (50%), Positives = 210/309 (67%), Gaps = 5/309 (1%) 

50 

Query: 1 MEIQFLGTGAGQPAKARNVSSLVLKLLDEINEVWMFDCGEGTQRQILETTIKPRKVKKIF 60 

ME+ FLGTGAG PAKARNV+S+ LKKL+E VW+FDCGE TQ QIL TTIKPRK++KIF 
Sbjct: 1 MELLFLGTGAGIPAKARNVTSVALKLIiEERRSVWLFDCGEATQHQILHTTIKPRKIEKIF 60 

55 Query: 61 ITHMHGDHVFGLPGFLSSRAFQANEEQTDLDIYGPVGIKSFVMTALRTSGSRIjPYRIHFH 120 

ITHMHGDHV+GLPG L SR+FQ E++ L +YGP GIK+F+ T+L + + L Y + 
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Sbjct: 


61 


Query: 


121 


Sb j Ct : 


119 


Query: 


181 


Sbjct: 


176 


Query: 




Sbjct: 


236 




301 


Sbjct: 


295 



GYRV +KD+ G+D A+ LK 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2623> which encodes the amino acid 
sequence <SEQ ID 2624>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2352 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 253/307 (82%) , Positives = 285/307 (92%) 

Query: 1 MEIQEIGTGAGQPAKARNVSSLVLKLLDEINEVWMFDCGEGTQRQILETTIKPRKVKKIF 60 

ME+QFLGTGAGQPAK RNVSSL LKLLDEINEVWMFDCGEGTQRQILETTIKPRK++KIF 
Sbjct: 1 MELQFLGTGAGQPAKQRNVSSIiALKLLDEINEVWMFDCGEGTQRQILETTIKPRKIRKIF 60 

Query: 61 ITHMHGDHVFGLPGFLSSRAFCANEEQTDLDIYGPVGIKSFVMTALRTSGSRLPYRIHFH 120 

ITH+HGDH+FGLPGFLSSR+FQA+EEQTDLDIYGP+GIK-I-+V+T+L+ SG+R+PY+IHFH 
Sbjct: 61 ITHLHGDHIFGLPGFLSSRSFQASEEQTDLDIYGPIGIKTYVLTSLKVSGARVPYQIHFH 120 

Query: 121 EFDESSLGKIMETDKFTVYAEKLDHTIFCMGYRWQKDLEGTIJ3AEALKLAGVPFGPLFG 180 

EFD+ SLGKIMETDKF VYAE+L HTIFCMGYRWQKDLEGTLDAEALK AGVPFGPLFG 
Sbjct: 121 EFDDKSLGKIMETDKFEVYAERLAHTIFCMGYRWQKDLEGTLDAEALKAAGVPFGPLFG 180 

Query: 181 ra/KNGENVTLEDGREIIAKDYISEPKKGKVITIIX3DTRKTDASIRIiALGADVLVHESTYG 240 

K+KNG++V BEDGR I AKDYIS PKKGK+ 1 T I +GDTRKT AS++LA ADVLVHESTYG 
Sbjct: 181 KIKNGQDVELEDGRblCAKDYISAPKKGKIITIIGDTRKTSASVKLAKDADVLVHESTYG 240 

Query: 241 KGDERIAKSHGHSTNMQAADIAKQANAKRLLIMJVSARFMGRDCWQMEEDAKTIFSNTHL 300 

KGDERIA++HGHSTNMQAA IA +A AKRLLLHHVSARF+GRDC QME+DA TIF N + 
Sbjct: 241 KGDERIARIffiGHSTNMQAAQIAHEAGAKRLLUmVSARFLGRDCRQMEKDAATIFENVKM 3 00 

Query: 301 VRDLEEV 307 

V+DLEEV 
Sbjct: 301 VQDLEEV 307 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 866 

A DNA sequence (GBSx0918) was identified in S.agalactiae <SEQ ID 2625> which encodes the amino 
acid sequence <SEQ ID 2626>. This protein is predicted to be similar to ketoacyl reductase. Analysis of this 
protein sequence reveals the following: 

d N- terminal signal sequence 



le Certainty=0. 0000 (Hot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 3 HTILITGASGGLAQAIINQLPQDD-HLIVTGRSREKLEKLYGKRPNTIjCLSLDITN-DNA 60 

+ 1 ITGASGGL +1 + H++++ R ++L ++ K +1 D 

Sbjct: 7 KRIWITGASGGLGERIAYLOUffiGAHVLLSARRBDRLIEIKRKITEEWSGQCEIFPLDVG 66 

Query: 61 VTNMIEKIYGEFGQIDILINNAGEGSFKEFWDYSDEEVKDMFAVNTFATMSIARQIGHKM 120 

I +■+ + G ID+L1NNAGFG F+ D + +++K MF VN F ++ + + +M 
Sbjct: 67 RLEDIARVRDQIGS IDVLINNAGFGI FETVUDSTLDDMKAMFD VNVFGLIACTKAVLiPQM 126 

Query: 121 SLWSGHIVNIASmGLIATSKASVYGASKFAWGFSNAiRLElJffiKNVYVTSVNPGPIK 180 

K GHI+NIAS AG IAT K+S+Y A+K AV+G+SNALR+EL+ +YVT+VNPGPI+ 
Sbjct: 127 LEQKKGHIINIASQAGKIATPKSSLYSATKHAVLGYSNAIjRMELSGTGIYVTTVNPGPIQ 186 

Query: 181 TGFFAQADPSGDYLASIGRFADTPEKVSKICWSILGKNKRELIJLPFILAFAHKYYSLFPK 240 

T FF+ AD GDY ++GR+ L P+ V+ ++ + + KRE+NLP ++ K Y LFP 
Sbjct: 187 TDFFSIADKGGDYAKNVGRtMjDPDDVAAQITAAIFTKKREINLPRLMNAGTKLYQIjFPA 246 

Query: 241 TADYFARKVFNYK 253 

+ A + K 
Sbjct: 247 LVEKLAGRALMKK 259 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2627> which encodes the amino acid 
sequence <SEQ ID 2628>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
»> Seems to have a cleavable N-terrn signal seq. 



Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB0522S GB.-AP001512 oxidoreductase [Bacillus halodurans] 
Identities = 107/259 (41%) , Positives = 156/259 (59%) , Gaps = 5/259 (1%) 

Query: 1 MAQRI IVITGASGGIiAQAIVKQLPKEDSLI -LLGRNKERLEHCYQHI DNKECLELD 55 

M ++ I ITGAS GL + + E++++ L R++ERLE+ + + +D 

Sbjct: 1 MRKKTIFITGASSGLGRQLAIDFSWEETVLCLFARSQERLENU'QRIvvENGGEAHlYPVD 60 

Query: 56 ITNPVAIEKMVAQIYQRYGRIDVLINIIAGYGAFKGFEEFSAQEIADMFQVNTLASIHFAC 115 

+ +P +I++ A+ G +DVLIMNAGYG F+ F + E MF+VN + 

Sbjct: 61 IiADPQSIDRSFAEAISAVGVVBVLimJAGYGVFEFFCDSQMDENERMFRVNVFGLMRATA 120 



60 Query: 116 LIGQKMAEGjGQGHLINIVS^GLIASAICSSIYSATKFALIGFSNALRLELADKGVYVTTV 175 

+ M EQG GH+INI S AG IA+AKS+IYSATK A++GF+N+LR+EL G++V+ V 
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Sbjct: 121 AVLPTMREQGSGHIINIASQAGKIATAKSAIYSATKHAVLGFIWSLRMELKGTGIHVSAV 180 

Query: 176 NPGPIATKFFDQADPSGHYLESVGKFTICPNQVAKRLVSIIGKNKRELNLPFSLAVTHQF 235 

NPGPI T FFDQAD G Y V + LP V++++V + K KRELNLP+ + + 
Sbjct: 181 NPGPIQTPFFDQADKEGAYTSK^QRIMLDPEDVSEKIVQLTKKPKREIJ^LPWWMNIGATA 240 

Query: 236 YTLFPKLSDYLARKVFNYK 254 

Y + P+L + LA K F K 
Sbjct: 241 YQVAPRLLELLAGKQFRQK 259 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/251 (61%) , Positives = 200/251 (78%) 

Query: 3 RTILITGASGGLAQAIINQLPQDDHLIVTGRSREKLEKLYGKRPNTLCLSLDITNDNAVT 62 
R I + ITGASGGLAQAI + QLP++D LI+ GR++E+LE Y N CL LDITN A+ 

Query: 63 NMIEKIYGEFGQIDILINNAGFGSFKEFWDYSDEEVKDMFAVNTFATMSIARQIGHKMSL 122 

M+ +IY +G+ID+LINNAG+G+FK F ++S +E+ DMF VNT A++ A IG KM+ 
Sbjct: 64 KMVAQIYQRYGRIDVLINNAGYGAFKGFEEFSAQEIADMFQVNTIiASIHFACLIGQKMAE 123 

Query: 123 WSGHIWIASMAGLIATSKASVYGASKFAWGFSNALRLELAEKNVYVTSVNPGPIKTG 182 

GH++NI SMAGLIA++K+S+Y A+KFA++GFSNALRLELA+K VYVT+VNPGPI T 
Sbjct: 124 QGQGHLINIVSMAGLIASAKSSIYSATKFALIGFSNALRLELaDKGVYVTTVNPGPlATK 183 

Query: 183 FFAQADPSGDYLASIGRFALTPEKVSKKVVSILGB3JKRELNLPFILA.FAHKYYSLFPKTA 242 

FF QADPSG YL S+G+F L P +V+K++VS1+GKNKRELNLPF LA H++Y+LFPK + 
Sbjct: 184 FFDQADPSGHYLESVGKFTLQPNQVAKRLVS1IGKNKRELNLPFSLAVTHQFYTLFPKLS 243 

Query: 243 DYFARKVFNYK 253 

DY ARKVFNYK 
Sbjct: 244 DYLARKVFNYK 254 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 867 

A DNA sequence (GBSx0919) was identified in S.agalactiae <SEQ ID 2629> which encodes the amino 
acid sequence <SEQ ID 2630>. This protein is predicted to be single-stranded-DNA-specific exonuclease 
(recJ). Analysis of this protein sequence reveals the following: 

Possible site: 31 

Lve no N- terminal signal sequence 

197 - 213 ( 197 - 213) 



Final Results 

bacterial membrane Certainty=0. 1065 (Affirmative) < suco 

bacterial outside Certainty=D. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14721 GB:Z99118 similar to single-strand DNA-specific 
exonuclease [Bacillus subtilis] 
Identities = 276/772 (35%) , Positives = 447/772 (57%) , Gaps = 45/772 (5%) 

Query: 1 MISAKYSWVIOTQKPDAGFFEASKKE-KISEAA^LIYSRGIKTSAELHHFLQTNLEI^H 59 

M+++K W + Q+PD ++ ++ 1+ VASL+ RG T+ FL T + + 

Sbjct: 1 MLASKMRWEI--QRPDQDKVKSLTEQLHITPLVASLLVKRGFDTAESARLFLHTKDADFY 58 

Query: 60 DPYLI^MDKAVI^IPJ^IENKETILVYGDYDADGMTSASIMKEALDMMGAEVQVYLPNR 119 

DP+ + M +A +RI++AI E 1 + +YGDYDADG+TS S+M L + A+V Y+P+R 
Sbjct: 59 DPFEMKGMKEAADRIKQAISQQEKIMIYGDYDADGVTSTSVMLHTLQKLSAQVDFYIPDR 118 
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.Query: 120 FTDGYGPNQSVYKYFI EQQDVS L 1 1 TVDNG VAGHEAI T YAQNQG VBWVTDHHSMPADLP 179 

F +GYGPN+ ++ I+++ SL1ITVD G+A A+ G+DV++TDHH +LP 

Sbjct: 119 FKEGYGPNEQAFRS - IKERGFSLI ITVDTGIAAVHEAKVAKELGLDVI ITDHHEPGPELP 177 

Query: 180 CAYAIIHPEHPDAWYPFPYLAGCGVAFKVAC3UiETlPTEMLDLVAIGTIADMVSLTDEN 239 

AI+HP+ P YPF LAG GVAFK+A ALL +P E+LDL AIGTIAD+V L DEN 
Sbjct: 178 DVI^IVHPKQPGCTYPFKEIAGVGVAFKLAHALLGELPDELLDLAAIGTIADLVPLHDEN 237 

Query: 240 RIMVKAGLEVMKDSERIGLQELISLSNIDLKTLNEETIGFKIAPQLNALGRLDDPNPAIE 299 

R++ GLE ++ + R+GL+ELI LS D+ NEET+GF+ +AP+LNA+GR+ + +PA+ 
Sbjct: 238 RLIATLGLERLRRTNRLGLKELIKLSGGDIGEANEETVGFQLAPRLNAVGRIEQADPAVH 297 

Query: 300 LLTGFDDEESQAIAQMIDQKNEERKEIVQTIFDQAKQMLDQ- - 

LL D E++ +A IDQ N+ER+++V + D+A++M++Q 
Sbjct: 298 LLMSEDSE^AEELAAEIDQIiNKERQKMVSKI/ITaEaiEMVEQQGLDQTAIVVAKAGWNPGV 357 

Query: 357 LGIVAGRILERTGQPVIVLNI--EDGIAKGSARSVEALDIFQAFDQHRELFIAFGGHSGA 414 

+GIVA ++++R +P IVL I E GIAKGSARS+ ++F++ + R++ FGGH A 
Sbjct: 358 VGIVASKLVDRFYRPAIVLGIDEEKG1AKGSARSIRGFMLFESLSECRDILPHFGGHPMA 417 

Query: 415 AGMTLEESKVGDLSQVLCDY1SKKQLDMSQKKTLTIDSELRFDELSLDTVRDFEKIAPFG 474 

AGMTL+ V DL L + + +D ++++++++ + L+PFG 

Sbjct: 418 AGMTLKAEDVPDLRSRIjNEIADOTLTEEDFIPVQEVDLVCGVEDITVESIAEMNMLSPFG 477 

Query: 475 MDNKKPVFLLKDFKVSQARVMGQNGAHLKLKLEQDGQALDLVAFNMGSQLQEFQQAQHLE 534 

M N KP L+++ + R +G N H+K+ + + LD V FN G + + 
Sbjct: 478 MLNPKPHVLVENAVLEDVRKIGANKTHVKMTIRNESSQLDCVGFNKGELQEGIVPGSRIS 537 

Query: 535 IAVTLSVNQWNGATTLQLMLEDARVDGIQLFDIRSK ASSLPHG 577 

4 +S+N+WN QLM++DA V QLFD+R K S+LP 

Sbjct: 538 IVGEMSINEVTONRKKPQLMIKDAAVSEWQLFDLRGKRTWEDTVSALPSAKRAIVSFKEDS 597 

Query: 578 VPILSQEEQSKE VILLTVPDHPQELKQMTQGKQFDAIYFKN 618 

V ++S ++Q+K ++LL P L ++ +GK + IYF 

Sbjct: 598 TTLLQTBDLRREVHVISSKDQAKAFDLDGAYIVLLDPPPSLDMLARLLEGKAPERIYF1F 657 

Query: 619 EIPKNYFISGYGTRDQFASLYKTIYQFPEFDVRYKLKELSSYLHIPDILLIKMIQIFEEL 678 

+++F+S + RD F Y + + FDV+ EL+ + + M ++F +L 

Sbjct: 658 LNHEDHFLSTFPARDHFKWYYAFLLKRGAFDVKKHGSELAKHKGWSVETINFMTKVFFDL 717 



A related DNA sequence was identified in S.pyogenes <SEQ ID 263 1> which encodes the z 
sequence <SEQ ID 2632>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 220 - 236 ( 220 - 236) 
INTEGRAL Likelihood = -0.11 Transmembrane 667 - 683 ( 667 - 683) 

Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 474/731 (64%), positives = 594/731 (80%) 

Query: 1 MISAKYSWVIjNNQKPDAGFFE^SKKEKISE^VASLIYSRGIKTSAELHHFLQTNLENLHD 60 

Ml +KYSW + ++KPD GFF+ +K + +++ A LIY RGI+T L FL +L LHD 
Sbjct: 1 MIKSKYSWKIKDKKPDDGFFKIiAKTKGLTQTAAQLIYDRGIRTEEALDEFLTADLSQLHD 60 

Query: 61 PYLLM)^KAVNRIRRAIFJJNETILVYGDYDATJGMTSASIMKEALDiVIMGAEVQVY 120 
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Query: 121 TDGYGPNQSVYKYFIEQQDVSLIITVDNGVAGHEAITYAQNQGVDVWTDHHSMPADLPC 180 

TDGYGPNQSVYKYFIEQ+ VSLIITVDNGVAGHEAI YAQ Q VDV+VTDHHS+P +LP 
Sbjct: 121 TDGYGPNQSVYKYFIEQEAVSLIITTONGVAGHEAIRYAQEQEVDVIVTDHHSLPEEBPE 180 

Query: 181 AYAIIHPEHPDAl^PFPYIiAGCGVAFKVACALLETIPTEMLDLVAIGTIADMVSLTDENR 240 

A+AI I HPEHPDA+YPF +LAGCGVAFK+A ALLE++PT+ LDLVAIGTIADMvSLT ENR 
Sbjct: 181 AFAIIHPEHPDADYPFKHLAGCGVAFKIATALLESLPTDCLDLVAIGTIADMVSLTGENR 240 

++VK GL ++K +ER+GLQEL+SLS IDL+ KE+ IGF+IAPQLNALGRLDDPNPAIEL 
Sbjct: 241 VLVKNGLAMLKHTERVGLQELMSLSPIDLEHFNEDAIGFQIAPQLNALGRLDDPNPAIEL 300 

Query: 301 LTGFDDEESQAIAQMIDQKNEERKEIVQTIFDCAMQMLDQTKPVQVLAKENWHPGVLGIV 360 

LTGFDD+E+QAIA MI +KNEERK +VQ IFDQAM M+D KPVQVLA+ WHPGVLGIV 
Sbjct: 301 LTGFDDQEAQAIALMIKKKNEERKALVQDIFDQA^4AMVDPQKPVQVLAQAGWHPGVLGIV' 360 

Query: 361 AGRIJjERTGQPVIVIiNIEDGIAKGSARSVEALDIFQAFDQHRELFIAFGGHSGAAGMTLE 420 

AGRI+E GQ V+VL I++G ARGSARS+EA++IF+A + RELF AFGGH+GAAGMTL 
Sbjct: 361 AGRIMETIGQTVWLTIDNGFAKGSARSLEAIKIFEALNGKRELFTAFGGHAGAAGMTLP 420 

Query: 421 ESKVGDLSQVLCDYISKKQLDMSQKKTLTIDSELRFDELSLDTVRDFEKLAPFGMDNKKP 480 

+ LS LC ++ ++ LD + K TLTID L D+LSLD ++ +KLAP+GMD++KP 
Sbjct: 421 VDNLFALSDFLCQFVIERGLDQTAKNTLTIDERLSLDDLSLDILKSLDKLAPYGMDHQKP 480 

Query: 481 VFIiLKDFKVSQARVMGQNGAHLKLKLEQDGQALDLVAFWMGSQLQEFQQAQHLELAVTLS 540 

VF +KD +VSQAR +GQ+ +HLK K+ Q + D++AF GSQLQEF+QA LELAVTLS 
Sbjct: 481 VFYVKDIRVSQARTIGQDQSHLKFKVSQGKASFDVLAFGQGSQLQEFRQATGLELAVTLS 540 

Query: 541 VNQWNGATTLQLMLEDARVDGIQLFDIRSKASSLPHGVPILSQEEQSKEVILLTVPDHPQ 600 

VN WNG T+LQ ML DARVDG4QL D+R+K + +P G+P + ++ ++ +++ +P+ + 
Sbjct: 541 ViraWNGNTSLQFMLVDARVDGVQLLDLRTKTAKVPEGIPTIEEDPNARVILINDIPEDFK 600 

Query: 601 ELKQMTQGKQFDAIYFiOsEIPKNYFISGYGTREQFASLYKTIYQFPEFDVRYKLKELSSY 660 

+ K FDAIYFKN++ Y+++G+G+R+QFA LYKTIYQFPEFD+R+KL ELS Y 

Sbjct: 601 TWRNQFVHKDFDAIYFKNQMKHPYYLTGFGSREQFAKLYKTIYQFPEFDLRHKLTELSHY 660 

Query: 661 LHIPDILLIKMIQIFEELHFVTITEGIMTVNKEAEKRDISESQIYQELKETVKFQELMAL 720 

L+I +LLIK+IQIFEEL FVTI +G+MTVN +A+KR+ISES IYQ+LKE VKFQE+MAL 
Sbjct: 661 ^IEKLLLIKLIQIFEELSFVTIDDGLMTVNPQAQKREISESHIYQDLKELVKFQEIMAL 720 

Query: 721 GTPKEIYDFMM 731 

+PKE+YD+++ 
Sbjct: 721 ASPKEMYDYLV 731 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 868 

A DNA sequence (GBSx0920) was identified in S.agalactiae <SEQ ID 2633> which encodes the amino 
acid sequence <SEQ ID 2634>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4114 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 869 

A DNA sequence (GBSx0921) was identified in S.agalactiae <SEQ ID 2635> which encodes the amino 
acid sequence <SEQ ID 2636>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.10 Transmembrane 15 - 31 ( 14 - 33) 

Final Results 

bacterial membrane --- Certainty=0 .3039 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88584 GB:M18954 fructosyltransf erase [Streptococcus mutans] 
Identities = 67/219 (30%) , Positives = 106/219 (47%) , Gaps = 31/219 (14%) 

MRPI VRKKMYKKGKFWWAGI VT - 1 LGGS AI LGQDVKAEQAEAVTST I SEKTDS SQT I SD 59 
M VRKKMYKKGKFWWA I T +L G + V+A++A + T SE + SQ + 
METKVRKKMYKKGKFWWAT I TTAMLTGIGL - -SSVQADEANS-TQVSSELAERSQVQEN 57 

TSKLTLPVNSSKAMKNSAEPLIKTAFATSVSSNPREIAATPVKTFDASSKOTVKASTAEH 119 
T+ SS A +N A KT + S+NP AA V+ D ++KV+ + E 
TTA SSSAAENQA KTEVQETPSTNP- - -AAATVENTDQTTKVITDNAAVES 104 



Query:' 


1 


Sbjct: 


1 


Query: 




Sbjct: 


58 




120 


Sbjct: 


105 




167 


Sbjct: 


165 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



A related GBS gene <SEQ ID 8677> and protein <SEQ ID 8678> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: 9.08 
GvH: Signal Score (-7.5): -3.94 

Possible site: 34 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -5.10 threshold: 0.0 

INTEGRAL Likelihood = -5.10 Transmembrane 7 - 23 ( 6 - 25) 
PERIPHERAL Likelihood = 4.03 694 
modified ALOM score: 1.52 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3039 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

31.1/52.1% over 749aa 

Streptococcus mutans 

5 EGAD | 14681 | levansucrase precursor Insert characterized 

SP|P1170l|SACB_STRMU LEVANSUCRASE PRECURSOR (EC 2.4.1.10) (BETA-D-FRUCTOFURANOSYL 
TRANSFERASE) (SUCROSE 
6-FRUCTOSYL TRANSFERASE) . Edit characterized • 
GP|l53636|gb|AAA88584.lj jM18954 fructosyltransf erase Insert characterized 
10 PIR|B28551|B28551 levansucrase (EC 2.4.1.10) precursor - (strain GS-5) ' Insert 

characterized 



ORF02172(295 - 1731 of 3138) 

EGAD | 14681 | 14686 (7 - 756 of 797) levansucrase precursor {Streptococcus mutans} 
15 SP|P1170l|SACB_STRMU LEVANSUCRASE PRECURSOR (EC 2.4.1.10) (BETA-D-FRUCTOFURANOSYL 

TRANSFERASE) (SUCROSE 6-FRUCTOSYL TRANSFERASE). GP 1 153636 | gb |AAA88584 . 1 1 | M18954 

fructosyltransferase {Streptococcus mutans] PIR|B2855l|B28551 levansucrase (EC 2.4.1.10) 

precursor - Streptococcus mutans (strain GS-5) 

%Match =2.9 
20 %Identity =31.1 %Similarity =52.1 

Matches = 83 Mismatches = 115 Conservative Sub.s = 56 



LPEHLENQSYQH*PYQH*YQ*RHNHHQYLVQ*ERVQQLIQRAPCL*FQFWSYXXXN*LXXYR*KKlWiaCGKFWVvAGIV 

lllllllllll!) I 

METKVrkkmykkgk • •. i 1 i:t 



TILGGSAILGQDVI<AECMAVTSTISEKTDSSQTISDTSKLTLPTOSSEAMKNSAEPLIKTAFATSVSSNPREIAATPVK 
| : = | : : ||: || | |: :: : :: | ::|| || : |,|| || |: 

TAM LTGIGLSSVQADEANST QVSSELAERSQVQENTTASSSAAENQAKTEVQETPSTNP- - -AAATVE 



TFDASSKVWKASTAEHSANOTNSN- - - VNQVANDSEVITCQN STKQLPTVTYSAHVQDIGW QKSVDNAT 

| ,:||, : | I::] | : 1 : : | | :]|: | : : ] :) | 

NTDQTTKVITDNAAVESKASKTKDQAATVTKTAASTPEVGQTNEKDKAKATKFjADITTPKNTIDEY 



VSGTVGQEKQVEA- - - IKLSIKAPEG ITGKLSYKTY 

:= = HUH =11= II =11= 
INLSSLTQKQVEALNKVKLTSDAQTGHQMTYQEFDKIAQTLIAQDE VGTLDTAYLPGENDGYIDWNVIGGYGLKPH 



912 942 972 1002 1032 1062 1092 1122 

WGOJSWQPSVESGQVSGTVGQSRPIEALSINLTDNLQKLYDVYYRVHV^ 
II :||:| 

Tunn-vn-cmr 



1152 1182 1209 1239 1269 1290 1320 1350 

LKGQSVLTPTIPKEERPVLNYQVKV-GQNGWQSNXLEGQMAGTLGESKALDG VKFTLSTLKYGDILYRTHVQDKGWG 

|: | :,,:,| |: : |: || : :|: | | | :| 



1641 1S71 1701 1731 1761 1791 1821 1851 

EI SYQTYLQECDGWKPTVLEGQLGGSIGLSKSIKAII<]LNLGSTALGNIEYRTFIJ3GSGWQTVvNSGRESNOTNESQQ 

60 ||:| : | |:| I 

--GGNISi/KPSQKSIM^KETKKAHHVSTEKKQKKGNSFFAALLALFSAFCVSIGF 

750 760 770 780 790 



SEQ ID 8678 (GBS243) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
65 extract is shown in Figure 57 (lane 7; MW 94kDa). 
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GBS243-His was purified as shown in Figure 208, lane 10. 
Example 870 

A DNA sequence (GBSx0922) was identified in S.agalactiae <SEQ ID 2637> which encodes the amino 
acid sequence <SEQ ID 263 8>. This protein is predicted to be adenine phosphoribosyltransferase (apt). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.86 Transmembrane SI - 77 { 59 - 77) 
INTEGRAL Likelihood = -0.64 Transmembrane 137-153 ( 137 - 153) 

Final Results 

bacterial membrane Certainty=0 . 1744 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC46040 GB:U86377 adenine phosphoribosyltransferase; Apt 
[Bacillus subtilis] 
Identities = 110/170 (64%) , Positives = 135/170 (78%) 

Query: 1 MDI J NNYIASIENYPQEGITFRDISPL^IADGKAYSYAVRErVQYAADKDIDMIVGPEARGF 60 

MDL Y+ 4 +YP+EG+ P+DI+ LM G Y YA +IV+YA +K ID++VGPEARGF 
Sbjct: 1 MDLKQYVTIVPDYPKEGVQPKDITTLMDKGDVYRYATDQIVEYAKEKQIDLWGPEARGF 60 

Query: 61 IVGCPVAYALGIGFAPTOKPGKIiPREVISADYEKEYGLDTLTMEIRDAIKPGQRVLIVDDL 120 

I +GCP VAYALG+GFAPVRK GKLPREVI DY EYG D LT+H DAIKPGQRVLI DDL 
Sbjct: 61 IIGCPVAYALGVGFAPVRKEGKLPREVIKVDYGLEYGKDVLTIHKDAIKPGQRVLITDDL 120 

Query: 121 LATGGTVKATIEMIEKLGGWAGCAFLVELDGLNGRKAIEGYDTKVLMNF 170 

LATGGT++ATI+++E+LGGWAG AFL+EL L+GR +E YD LM + 
Sbjct: 121 LATGGTIEATIICLTOELGGVVAGIAFLIELSYLDGRNKLEDYDILTLMKY 170 

A related DNA sequence was identified in S. pyogenes <SEQ ID 263 9> which encodes the amino acid 
sequence <SEQ ID 2640>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty= 0 . 300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

1GB-.Z99120 similar to opine catabolism [Bacillus sub... 231 le-59 
>GP:CAB15253 GB:Z99120 similar to opine catabolism [Bacillus subtilis] 
Score = 231 bits (583), Expect = le-59 

Identities = 138/363 (38%) , Positives = 212/363 (58%) , Gaps = 11/363 (3%) 

Query: 5 IIGAGIVGSTAAYYLQQSGQKEVTIFDHGQ-GQATKAAAGIISPWFSKRRNKVWYRMARL 63 

I+GAGI+G++ AY+L ++G + VT+ D + GOAT AAAGI+ PW S+RRN+ WY++A+ 
Sbjct: 6 IVGAGIIiffiSTAYHIAKTGAR-VTVIDRKEPGCJATDAAAGIVCPWLSQRRNQDWYQIiAKG 64 

Query: 64 GADFYQQLINDLKEDGFATDFYQQNGIYVLKKQEEKLRDLYELALARKVESPIIGELAIK 123 

GA +Y+ LI+ L++DG + Y++ G + KL + E A R+ ++P IG++ 

Sbjct: 65 GARYYKDLIHQLEKDGESDTGYKRVGAISrHTDASKLDKMEERAYKRREDAPEIGDITRL 124 



Query: 124 NRKELGNDFKGLIGFDNCLYASGAARVEGAALCETLLKAS GYPV1RQKVTLKQQG- - 178 
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Sbjct: 


125 


Query: 




Sbjct: 


185 


Query: 


236 


Sbjct: 


245 




295 


Sbjct: 


305 




355 


Sbjct: 


365 



++ SGAARV G ALC +LL A+ G VI + 



F D VI+ AGAW ++L+PLG 



h F+ G+I G +HEND G DL ++ +AL P L +A 



^ P G V ++ LY A+GLG+SGLT+GP +G ELA+L+LG + 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 150/172 (87%), Positives = 161/172 (93%) 

Query: 1 MDLNNYIASIENYPQEGITFRDISPLMADGKAYSYAVREIVQYAADKDIDMIVGPEARGF 60 

MDL NYIASI++YP+ GITFRDISPLMADGKAYSYA+REI QYA DKDIDM+VGPEARGF 
Sbjct: 1 MDLTNYIASIKDYPKAGITFHDISPLMADGKAYSYAIREIAQYACDKDIDMWGPEARGF 60 

Query: 61 IVGCPVAYALGIGFAPVRKPGKLPREVISADYEKEYGLDTLTMHADAIKPGQRVLIVDDL 120 

I+GCPVA LGIGFAPVRKPGKLPR+V+SADYEKEYGLDTLTMHADAIKPGQRVLIVDDL 
Sbjct: 61 IIGCPVAVELGIGFAPVRKPGKLPRDWSADYEKEYGLDTLTMHADAIKPGQRVLIVDDL 120 

Query: 121 IiATGGTVKATIEMIEKLGGWAGCAFLVELDGLNGRKAIEGYOT 172 

LATGGTVKATIEMIEKLGG+VAGCAFL+EL+GLNGR AI YD KVLM FPG 
Sbjct: 121 1ATGGTVKATIEMIEKLGGIVAGCAFLIELEGLNGRHAIRNYDYKVLMQFPG 172 

SEQ ID 2638 (GBS419) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 6; MW 22.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 172 (lane 4; MW 47.51cDa). 

GBS419-GST was purified as shown in Figure 219, lane 6-8. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 



Example 871 

A DNA sequence (GBSx0923) was identified in S.agalactiae <SEQ ID 2641> which encodes the amino 
acid sequence <SEQ ID 2642>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0847 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA11244 GB:D78182 0RF2 [Streptococcus mutans] 
Identities = 140/225 (62%) , Positives = 178/225 (78%) 

55 

Query: 1 MTYrjEQYQSGQLTLPSALFFHFKSrFKTADDFLWQFFYLQNTINIiSDLTPSRIATSLDK 60 
M++L+ Y+SG L LPSAL FH+K IF ADDFLWQFFY QNTT + D+ S+IAT++ K 
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Sbjot: 1 MSPLQHYKSGWLVLPSALLFHYKDXFSNADDFLVWQFFYFaKTTKMEDIATSQIATAIGK 60 

Query: 61 TVADINRSISNLTSQGLLDVKTIEIjNHEIEIIFDTSPVFAKLDKLFEEDNQVIIDNKTSD 120 

TV ++NRS+SOT, SQ LLD+KTIEL+ E E++FD + JCLD L ++ + + 
Sbjct: 61 TVPEVNRSVSKLISQELLDMKTIELDGESEVLFDATLALKKLDDLLTAADETTVSSSKGT 120 

Query: 121 SNRLKDLVGDFERELGRLLSPFELEDLQtCTLQEDQTDPDIVRAALREAVFHGKTSWNyiN 180 

SN LKDLV DFERELGR+LSPFELEDLQKT+ +D+TDPD+VR+ALRFAVFNGKT4-WNYI 
Sbjct: 121 SNALKDLVEDFERELGRMLSPFELEDLQKTVSDDK"DPDLTOSALRFAA?FNGKTNWNYIQ 180 

Query: 181 AILPJSIWRREGLTTLRQIEERKQAREDNQMKDLAISDDFKNAMNLW 225 

AILRNWRREG++TLRQ+EER++ RE ++ +SDDF +AMNLW 

Sbjct: 181 AILPl^^EGISTLRQVEERRKEREQANPANVTVSDDFIjSAMNLW 225 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2643> which encodes the amino acid 
sequence <SEQ ID 2644>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA11244 GB:D78182 0RF2 [Streptococcus mutans] 
Identities = 154/228 (67%), Positives = 188/228 (81%), Gaps = 1/228 (0%) 

Query: 1 MSFLEHYKSGNLVIPSALLFHYKDLFKSSDDFLVWQFFyLQNTTKRDDLAPSQIAHALGK 60 

MSFL+HYKSGNLV+ PSALLFHYKD+ F ++DDFLVWQFFY QNTTK +D+A SQIA A+GK 
Sbjct: 1 MSFLQHYKSGNLVLPSALLFHYKDIFSNADDFLWQFFYFQNTTKMEDIATSQIATAIGK 60 

Query 

Sbjct 

Query: 

Sbjct 

Query: 

Sbjct 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 144/225 (64%) , Positives « 179/225 (79%) , Gaps = 1/225 (0%) 

MTYLEQYQSGQLTLPSALFFHFKS I FKTADDFLVWQFFYLQNTTNLSDLTPSRIATSLDK 6 0 
M++LE Y+SG L +PSAL FH+K +FK++DDFLVWQFFYLQNTT DL PS+IA +L K 
MSFLEHYKSGNLVIPSALLFHYKDLFKSSDDFLTOQFFYLQNTTIOIDDLAPSQIAHALGK 6 0 



-T 119 

+V ++N+ +S+L +Q LLDM+TIEL GE E++FDA+ LKLDL + T+ +T 
61 TVPEVNRSVSMISQELLDMKTIELDGESEVIiFDATLALKKLDDLLTAADETTVSSSKGT 120 





1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 




Query: 


181 


Sbjct: 


180 



AILRNWR+EG+ LRQ+EER++ RE + + IS+DF +AMNLW 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 872 

A DNA sequence (GBSx0924) was identified in S.agalactiae <SEQ ID 2645> which encodes the amino 
5 acid sequence <SEQ ID 2646>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1617 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:BAA11245 GB:D78182 ORF3 [Streptococcus mutans] 

Identities = 134/226 (59%) , Positives = 170/226 (74%) 

Query: 2 DLQLSKRLQKVANYVPKGARLLDVGSDHAYLPIFLLQMGYCDFAIAGEVWGPYQSALKN 61 
++ LS RLQ+VA++VPKGARLLDVGSDHAYLPI+LL+ G DFA+AGE++ GPY+SA+ N 
20 Sbjct: 7 EVSLSHRLQEVASFVPKGARLLDVGSDHAYLP1YLLEQGL1DFAVAGEIIKGPYESAVAN 66 

Query: 62 VSEHGLTSKIDWLANGLSAFEEADNIDTITICGMGGRL1ADILNNDIDKLQHVKTLVLQ 121 

V+E GL+ +1 VRLA+GL+A + D+ID ITICGMGGRLIADIL DKL VK L+LQ 
Sbjct: 67 VNESGLSGQIA VRLADGLAALNDNDDIDLITI CGMGGRL IAD I LAAGSDKLNSVKQLI LQ 126 

25 

Query: 122 PNNREDDLRKWIAflOTJFEIVAEDILTEMDKRYEILWKHGHIWIiTAKELRFGPFLLSNNT 181 

PNN EDDLR WL ANDF I AE ++ + K YEILW+ G + L+ K+LRFGPFL + 
Sbjct: 127 PNNCEDDLRSWLVANDFMIKAEKMVKDI^KYYEILvvEKGKITLSDKDIjRFGPFLRQERS 186 

30 Query: 182 TWKEKHCj^IJOKLTEAIjNSIPNSKMEERAlLEDKIQDIKEVLDES 227 

++FKE+W+ EL KL AL +P K + L ' KI+ I+EVL ES 
Sbjct: 187 SIFKERWRKEIAKIiEIiALTRVPAKKKADNMF^STKIEQIREVLYES 232 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2647> which encodes the amino acid 
35 sequence <SEQ ID 2648>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm --- Certainty=0 . 0803 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 145/224 (64%) , Positives = 173/224 (76%) 



Query: 1 MDLQLSKRLQKVANYVPKGARLLDVGSDHAYLPIFLLQMGYCDFAIAGEWNGPYQSALK 60 

MD QLS RL +VA YVPKG +LLDVGSDHAYLPIFL++ AIAGEW GPY+SALK 

Sbjct: 1 MDSQLSNRLAQVAAWPKGVKLLDVGSDHAYLPIFLvETNQISAAIAGEVVRGPYESALK 60 

Query: 61 mrSEHGLTSKIDWIANGLSAFEEADNIDTITICGMGGRLIADILNNDIDKLQHVKTLVL 120 

HV++ GL I VRLANGL+AFEEAD++ ITICGMGGRLIADIL +KLQ LVL 
Sbjct: 61 NVTQSGLAEHIQTOIANGLAAFEEADDVTAITICGMGGRLIADILFAGKEKLQGIERLVL 120 

Query: 121 QPI^EDDLRKWIAANDFEIVAEDILTEITOKRYEILVVKHGHMNLTAKELRFGPFLLSNN 180 
QPNNREDDLR WL+ N F+IVAE 1+ ENDK YEI+V +HG L+A ELRFGP+L 

Sbjct: : 



Query: 181 TTVFKEKWQNELNKLTFALNS I PNSKMEERAI LEDKI QDI KEVL 224 
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+ VFKEKWQ E++KL +AL+ IP K +ER +L KIQ IKEV+ 
Sbjct: 131 SWFKEKWQREMDKLAyALSCIPEEKTQERQLIiLTKIQQIKEVI 224 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 873 

A DNA sequence (GBSx0925) was identified in S.agalactiae <SEQ ID 2649> which encodes the amino 
acid sequence <SEQ ID 2650>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3245 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9893> which encodes amino acid sequence <SEQ ID 9894> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:BAA11246 GB:D78182 0RF4 [Streptococcus nutans] 

Identities = 187/262 (71%) , Positives = 224/262 (85%) 





2 


MKARELrDVYETYCPQELSMEGDISGLQIGSLDKEIKTVMVALDVRETTVAEAIKRQVDI. 


61 






MKA ++I YE YCPQ+LiS+EGDISGLQIG+LDKEIK +M+ALDVRETTVAEAIE++VDL 




Sbjct: 


1 


MKASQIIKRYEAYCPQDLSLEGDISGLQIGTLDKEIKRLMIALDVRETTVAEAIEKKVDL 


60 




62 


LIVKHAPIFRPLKDLVATPONKIYIDLLKSDIAVWSHTNIDIVPNGLNDWFCELLDIQY 


121 






LIVK1IAPIFRPLK+LV T QN IY +L+K DIAVYVSHTNIDIVP+GLNDWFC+LLDI+ 




Sbjct: 


61 


LIVKHAPIFRPLKNLVETAQITOIYFNLIKHDIAVYVSHTNIDIVPDGLNDWFCDLLDIKN 


120 






PDILSETSNGYGIGRIGDIRPQSFEFFAWKIKDVFGLDSVRLVSYDKSNPEIQRVAICGG 


181 






ILS + + YGIGR+GDI P SFE A K+K +F LDSVRL.VSY ++NP I R+AICGG 




Sbjct: 


121 


RRILSPSKDDYGIGRVGDISPLSFEDLAKKVKKIFNLDSVRLVSYGEKNPLISRIAICGG 


180 


Query: 


182 


SGQSFYKEAIAKGADVFVTGDIYYHTAQEMI'rNGLLAIDPGHHIEVLFVSKIATMIEQWK 


241 






SGQSFY+EA+ KGA V++TGDIYYHTAQEM+TNGLLA+DPGHHIEVLFV K+A + W 




Sbjct: 


181 


SGQSFYQEALTKGAQVYITGDIYYHTAQEMLTNGLIALDPGHHIEVLFVRKLAEKFQTWS 






242 


LEKGWDISVLESKAPTNPFYHM 263 








++ WDI++LES+ TNPFYH+ 




Sbjct: 


241 


CQEWTOITILESQVNTNPFYHL 262 





A related DNA sequence was identified in S.pyogenes <SEQ ID 265 1> which encodes the amino acid 
sequence <SEQ ID 2652>. Analysis of this protein sequence reveals the following; 

i N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm — Certainty=0. 18 04 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 159/262 (64%) , Positives = 214/262 (81%) 

Query: 2 MKARELIDVYETYCPQELSMEGDISGLQIGSLDKEIKTVMVALDWET1VAEAIERQVDL 61 
MKA+ LID Y E +CP +LSMEGD+ GLQ+GSLDK+I+ VM+ LD+RE+TVAEAI+ +VDI, 
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Sbjct: 


63 


Query: 


122 


Sbjct: 


123 




182 


Sbjct: 


183 


Query: 


242 


Sbjct: 


243 


Based on 


this 



MKAKTLIDAYEAFCPLDLSMEGDVKGLQMGSLDKDIRKVMITLDIRESTVAEAIKNEVDL 62 



LSET G+GIGRIG ++ Q+ E A K+K VF LD+VRL+ YDK NP I ++AICGG 



SG FY++A+ KGADV++TGDIYYHTAQEM+T GL A+DPGHHIEVLF K+ 



E GWD+S++ SKA TNPF H+ 



Example 874 

A DNA sequence (GBSx0926) was identified in S.agalactiae <SEQ ID 2653> which encodes the amino 
acid sequence <SEQ ID 2654>. This protein is predicted to be (). Analysis of this protein sequence reveals 
the following: 

ite: 41 

ff-term signal seq. 



Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15253 GB:Z99120 similar to opine catabolism [Bacillus subtilis] 
Identities = 148/368 (40%), Positives = 211/368 (57%), Gaps = 13/368 (3%) 

Query: 1 MKKIAIIGAGAVGATIAYYLSKEKDIQVTVFDYGV-GQATKAAAGIISPWFSKRRNKAVfY 59 

MK I+GAG +GA+ AY+L+K +VTV D GQAT AAAGI+ PW S+RRN+ WY 
Sbjct: 1 MKSYIIVGAGILGASTAYHI^T-GARVTVIDRKEPGQATDAAAGIVCPWLSQRRNQDWY 59 

Query: 60 RMARM3ADFYSKLVTDLQKDGFETKFYQQTGVFLLKKDESQLESLFALADKRRIjESPLIG 119 

++A+ GA +Y L+ L+KDG Y++ G + D S+L+ + A KRR ++P IG 

Sbjct: 60 QLAKGGARYYKDLIHQLEKDGESDTGYKRVGAISIHTDASPO.DKMEERAYKRREDAPEIG 119 

Query: 120 DLQILNKSEANTHFPEL-DGYEQLLYASGGARVEGADLTRILLEAS GVNVIKDEVHF 175 

D+ L+ SE FP L DGYE ++ SG ARV G L R IiL A+ G VIK 
Sbjct: 120 DITRLSASETKKLFPIIADGYES-WISGA^WGRALCRSLLSAAEKRGATVTKGNASL 178 

Query: 176 TITDNGFRVQGIDFDKLVIASGAWLAKILDEHl'n'QVDVRPQKGQLRDYYFSNINT 230 

T+T + D +++ +GAW +IL V QK Q+ + ++ +T 

Sbjct: 179 LFENGTVTGVQrDTKQFAADAVIVTAGAWANEILKPLGIHFQVSFQKAQIMHFEMTDADT 238 

Query: 231 GKYPVVMPEGELDIIPFDNGKVSVGASHENDMAF-DLNIDFKVLDKFEEQAIGYFPQLKK 289 

G +PWMP + 1+ FDNG++ GA+HEND DL + + +A+ P L 

Sbjct: 239 GSWPVVMPPSDQYILSFDNGRIVAGATHENDAGLDDLRVTAGGQHEVLSKA1AVAPGLAD 298 

Query: 290 ADTTSERVGIRAYTSDFSPFFGPVPCMEGAYAASGLGSTGLTVGPLIGYELCQLIIjNKEN 349 

A RVG R +T F P G VP ++G YAA+GLG++GLT+GP +G EL +L+L K+ 

Sbjct: 299 AARVETRVGFRPFTPGFLPWGAVPNVQ3LYAAIIGLGASGLTMGPFLGAELAKLVLGKQT 358 

Query: 350 QLNLEDYD 357 
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+L+L YD 
Sbjct: 359 ELDLSPYD 366 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2655> which encodes the amino acid 
5 sequence <SEQ ID 2656>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

10 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 211/360 (58%) , Positives = 262/360 (72%) 





3 


KIAIIGRGAVC^TIAyYLSKEia3IQVTVFDYGVGQATKaAAGIISPWFSK3JR]^WYRm 


62 






KIAIIGAG VG+T AYYL + +VT+FD+G GCATKAAAGIISPWFSKRRNK WYRMA 




Sbjct: 


2 


KI AI IGAGI VGSTAAYYLQQSGQKE VTI FDEGQGQATRARAGI I SPWFSKRRNKVWYRMA 


61 


Query: 


63 


RLGADFYSKLVTDLQKDGFETKFYQQTGVFLLKKDESQLESLFALADKRRLESPLIGDLQ 


122 






RLGADFY +L+ DL++DGF T FYQQ G+++LKK E +L L+ LA R++ESP+IG+L 




Sbjct: 


62 


RLGADFYQQLIIODLKEDGFATDFYQQNGIYvIiKKQEEKIjRDLYELALARKVESPIIGELA 


121 


Query: 


123 


IIiNKSEAOTHFPELDGYEQLLYASGGARVEGADLTRILLFASGVNVIKDEVHFTITDNGF 


182 






IB+E F L G++ LYASG ARVEGA L LL+ASG VI+ +V +G+ 




Sbjct: 


122 


IKNRKEIXSNDFKGIilGFDNCLYASGAARvEGAALCETbliKASGYPVIRQKVTIjKQQGSGY 


181 




183 


RVQGIDFDKLVXASGAWLAKILDEHNYQVDVRPQKGQLRDYYFSNIHTGKYPVVMPEGEL 


242 






+ G FD+++LA+GAWL +L YQVDVRPQKGOJj DY +1 + YPWMPEGE+ 




Sbjct: 


182 


EIAGHYFDQVILAAGAWLPDLLRPLGYQVDVRPQKGQLLDYDVHHIISDTYPWMPEGEI 


241 




243 


DIIPFDNGKVSVGASHEND^FDI^IDFKVLDKFEEQAIGYFPQLKKADTTSERVGIRAY 


3 02 






D+IPF+ GK+SVG SHEND +DL D++VL K E QA+ Y P LK+A + RVGIRAY 




Sbjct: 


242 


DLIPFNQGKISVGTSHEITOKGYDLEPDWQVT^BCKLEMQALTYLPLLKEATQKTCRVGIRAY 


301 




303 


TSDFSPFFGPVPCMEGAYAASGLGSTGLTVGPLIGYELCQLIIjNKENQLNLEDYDITKYV 


362 






TSD+SPF4G V ++ Y ASGLGS+GLTVGPLIGYEL QL+L E L DY Y+ 




Sbjct: 


302 


TSDYSPFYGQVSGLKNLYTASGLGSSGLTVGPLIGYELAQLLLGHEGLLTPSDYSPEPYL 


361 



40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8679> and protein <SEQ ID 8680> were also identified. Analysis of this 
protein sequence reveals the following: 

45 Lipop Possible site: -1 Crend: 2 

McG: Discrim Score: 4.44 
GvH: Signal Score (-7.5): 0.81 

Possible site: 41 
»> Seems to have a cleavable N-term signal seq. 
50 ALOM program count: 0 value: 7.32 threshold: 0.0 

PERIPHERAL Likelihood = 7.32 153 
modified ALOM score: -1.96 

*** Reasoning Step: 3 

55 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Wot Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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45.2/52.7% over 163aa 

Bacillus subtilis 

EGAD| 109026 | hypothetical protein Insert characterized 

SP|032159|YURR_BACSU HYPOTHETICAL 39.4 KDA OXIDOREDUCTASE IN HOM-MRGA INTERGENIC REGION. 
5 Insert characterized 

GP| 2635760 | eitib| CAB15253 . 1 1 |Z99120 similar to opine catabolism Insert characterized 
PIR|A70019 |A70019 opine catabolism homolog yurR - Insert characterized 

ORF02167(301 - 792 of 1161) 
10 EGAD| 109026 1 BS3258U - 164 of 372) hypothetical protein {Bacillus subtilis} 

SP| 032159 1 YURR_BACSU HYPOTHETICAL 39.4 KDA OXIDOREDUCTASE IN HOM-MRGA INTERGENIC REGION. 

GP | 2635760 | emb | CAB15253. l| |Z99120 similar to opine catabolism {Bacillus subtilis] 

PIR|A70019|A70019 opine catabolism homolog yurR - Bacillus subtilis 

%Match =16.6 
15 %Identity =45.2 %Similarity =62.7 

Matches = 75 Mismatches = 58 Conservative Sub.s = 29 

228 258 288 318 348 378 435 

SYYD*AVET+KRLGYFSFRE*SSNKSLLPWGAIMiCKIAIIGAGAVGATLAYYLSKEKDIQVTVFDYGV-GQATKAAAGI 
20 || |:||| :||: ||:|:| :||| | |||| Mill 

MKSYIIVGAGILGASTAYHLAKT-GARVTVIDRKEPGQATDAAAGI 
10 20 30 40 

465 495 525 555 585 615 645 675 

25 ISPWFSKRRNKAVWRMARLCADFYSKLVTDLQKDGFETKPYQQTGVFLLKKDESQLESLFAIADKRRLESPLIGDLQILN 

: l|:|:|lh II H 1= I « I I I h= I I |:|» = I III -I I I I I •' 

VCPWLSQRRNQDWYQIiAKGGARYYKDLIHQLEKDGESDTGYKRVGAISIHTDASKLDKMEERAYKRREDAPEIGDITRLS 
60 70 80 90 100 110 120 

30 705 732 762 792 822 852 882 912 

KSEANTHFPEL - DGYEQLLYASGGARVEGADLTRI LXEASGVNVI KDESHFTITDKWLS CSRN* F* *TCLASGAPAS * I L 

II II I llll " II III I I I I l< I ' 

ASETKKLFPILADGYE-SVIiISGAARWGRALCRSLLSAAEI<RGAT\ r IKGNASLLFENGTvTGVQTDTKQFAADAVIVTA 
140 150 160 170 180 190 200 

35 

SEQ ID 8680 (GBS290) was expressed in E.coli as a His-fosion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 6; MW 22kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 4; MW 47kDa). 

GBS290-GST was purified as shown in Figure 226, lane 9. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 875 

A DNA sequence (GBSx0927) was identified in S.agalactiae <SEQ ID 2657> which encodes the amino 

acid sequence <SEQ ID 2658>. Analysis of this protein sequence reveals the following: 

45 Possible site: 20 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.18 Transmembrane 38 - 54 ( 36 - 54) 

Final Results 

50 bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

55 >GP:AAD19913 GB:AF105113 glucose -1 -phosphate thymidylyl transferase 

[Streptococcus pneumoniae] 
Identities - 262/289 (90%), Positives = 276/289 (94%) 
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Query: 




Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 



MKGIILAGGSGTRLYPLTRAASKQLMPIYDKPMIYYPLSVLMLAGIKEILIISTPQDLPR 60 
MKGIILAGGSGTRLYPLTRA&SKQLMP4YDKPMIYYPLS LMLAGIK+ILIISTPQDLPR 
MKGriLaGGSGTRLYPLTRftASKQLMPVYDKPMIYYPLSTLMLAGIKDILIISTPQDLPR SO 



2 GI LSYAEQPSPDG1AQAF+IGE+FIGDD VAL+LGDNIYHGPGLS ML 



Q+AA KE GATVFGYQVKDPERFGWEFDTDMKAISIEEKP P+SNYAVTGLYFYDNDV 
QKAAKKEKGATVFGYQVKDPERFGVVEFDTDMNAISIEEKPEYPRSNYAVTGLYFYDNDV 180 

VEIAKNIKPSPRGELEITDVNKAYLDRGDLSVELMGRGFAWLDTGTHESLLEAAQYIETV 24 0 
VEIAK IKPS RGELEITDVNKAYL+RGDLSVELMGRGFAWLDTGTHESLLEA+QYIETV 
VEIAKQIKPSARGELEITDVNKAYIMRGDLSVELMGRGFAWLDTGTHESLLEASQYIETV 240 

QRMQNVQVANLEEIAYRMGYITREQVLELAQPLKKNEYGQYLLRLIGEA 289 
QRMQNVQVANLEEI +YRMGYI +RE VLELAQPLKKNEYG+YLLRLIGEA 
QRMQNVQVANLEEISYRMGYISREDVTjEIAQPLKKNEYGRYLLRLIGEA 289 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2659> which encodes the amino acid 
sequence <SEQ ID 2660>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1585 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < succ> 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 207-209 

The protein has homology with the following sequences in the databases: 



MKGIILAGGSGTRLYPLTRAASKQLMPIYDKPMIYYPLSTLMLAGIKDVLI ISTPQDLPR 6 0 
MKGI ILAGGSGTRLYPLTFAASKQLMP+YDKPMIYYPLSTLMLAGI+D+LI I STPQDLPR 
MKGIILAGGSGTRLYPLTRAASKQLMPVYDKPMIYYPLSTLMLAGIRDILIISTPQDLPR 60 



F+ELD DGSEFGI LSY EQPSPDGLAQAFIIGEEFIGDD VALILGDNIYHG GL+ ML 



QKAA KEKGATVFGY VKDPERFGWEFDEKMNAISIEEKPE P+S++AVTGLYFYDWDV 



VEIAK+IKPS RGELEITDVNKAYL+RGDLSVELMGRGFAWLDTGTHESLLEA+QYIETV 



QR+QN QVANLEEIAYRMGYIS+EDV LAQSLKKNEYGQYLLRLIGEA 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 257/289 (88%) , Positives = 274/289 (93%) 

Query: 1 MKGIILAGGSGTRLYPLTRAASKQLMPIYDKPMIYYPLSVLMLAGIKEILIISTPQDLPR SO 

MKGIILAGGSGTRLYPLTRAASKQLMPIYDKPMIYYPLS LMLAGIK++LIISTPQDLPR 
Sbjct: 1 MKGIILAGGSGTRLYPLTRAASKQLMPIYDKPMIYYPLSTLMLAGIKDVLIISTPQDLPR SO 

Query: 61 FEDMLGDGSELGISLSYAEQPSPDGLAQAFIIGEDFIGDDHVALVLGDNIYHGPGLSAML 120 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 
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Query: 121 QRAASKESGATVFGYQVKDPERFGVXrEPDTDMNAISIEEKPAQPKSNYAVTGLYFYDNDV 180 

Q+AA+KE GATVFGYQVKDPERFGWEFD +MNAISIEEKP PKS++AVTGLYFYDNDV 
Sbjct: 121 QKAAAKEKGATVFGYQVI<DPERFGVVEFDENMNAISIEEKPEVPKSHFAVTGLYFYDNDV 180 

Query: 181 VEIAKNIKPSPRGELEITDVNKAYLDRGDLSVELKGRGFAWLDTGTHESLLEAAQYIETV 240 

VEIAKNIKPS RGELEITDTOIKAYL+RGDLSVELMGRGFAWLDTGTHESLLEAAQYIETV 
Sbjct: 181 VEIAKNIKPSARGELEITDVNKAYLERGDLSVELMGRGFAWLDTGTHESLLEAAQYIETV 240 

Query. 241 QRMQNVQVANLEEIAYRMGYITREQVLELAQPLKKNEYGQYLLRLIGEA 289 

QR+QN QVANLEEIAYRMGYI++E V +LAQ LKKNEYGQYLLRLIGEA 
Sbjct: 241 QRLQNAQVANLEEIAYRMGYISKEDVHKLAQSLKKNEYGQYLLRLIGEA 289 

There is also homology to SEQ ID 858. 

SEQ ID 2658 (GBS296) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 5; MW 35.4kDa). 

GBS296-His was purified as shown in Figure 203, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 876 

A DNA sequence (GBSx0929) was identified in S.agalactiae <SEQ ID 2661> which encodes the amino 
acid sequence <SEQ ID 2662>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2635 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 877 

A DNA sequence (GBSx0930) was identified in S.agalactiae <SEQ ID 2663> which encodes the amino 
acid sequence <SEQ ID 2664>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1868 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2665> which encodes the amino acid 
sequence <SEQ ID 2666>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0 .2818 (Affirmative) . 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < .• 
bacterial outside Certainty=0 . 0000 (Not Clear) < i 

RGD motif: 29-31 

The protein has homology with the following sequences in the databases: 



Query: 1 MTETFFDKPLACREIKEIPGLLEFDIPTOGDNRGKFKENFQKEKMLPIGFPERFFEEGKL 60 

MT+ FF K LA R+++ IPG+LEFDIPV G3NRGWFKENFQKEKMLP+GFPE FF EGKL 
Sbjct: 1 MTDNFFGKTLAARKVEAIPGMLEFDIPVHGDNRGWFKEKFQKEKMLPLGFPESFFAEGKL 60 

Query: 61 QIIWSFSRQHVLRGLHAEPWDKYISVADDGKVIX^W\,T>LREGETFGlNVyQTV'rDASKGMF 120 

QNNVSFSR++VLRGLHAEPWDKYISVAD GKVLG+WVDLREGETFGN YQTVIDASKG+F 
Sbjct: 61 QNNVS FSRKNVLRGLHAEPVfflKYI S VADGGKVLGSWVDLREGETFGNTYQTVIDASKGI F 120 

Query: 121 VPRGVANGFQVLSEWSYSYLVNDYWALDLKPKYAFWyADPSLGITVffiNLAAAEVSEAD 180 

VPRGVANGFQVLS+TOSYSYLVNDYtmL+LKPKYAFVNYADPSLGI WEN+A AEVSEAD 
Sbjct: 121 VPRGVANGFQVLSDWSYSYLVITOYWALELKPKYAFWYADPSLGIEWENIAEAEVSEAD 180 

Query: 181 KNHPLLSDVKPLKPKDL 197 

K+HPLL DVKPLK +DL 
Sbjct: 181 KHHPLLKDVKPLKKEDL 197 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 157/197 (79%) , Positives = 180/197 (90%) 

Sbjct: 1 

Query: 61 QNNISFNKKNTLRGLIIAEPWDKYVSIAD3GRVIGTWVDLREGDSFGNVYQTIIDASKGIF 120 

QIJN+SF++++ LRGLHAEPWDKY+S+AD+G+V+G WVDLREG+ +FGNVYQT+ IDASKG+ F 
Sbjct: 61 QNWSFSRQHVIiRGLHAEPraKYISVADDGKVLGAWVDLREGETFGNVYQTVIDASKGMF 120 

Query: 121 VPRGVANGFQVlSDKAAYTYLVTroYlVALELKPKYAFVNYADPNLGIQWENLEEAEVSEAD 180 

VPRGVANGFQ VLS + +Y+YLVMDYWAL+LKPKYAFVNYADP+LGI WENL AEVSEAD 
Sbjct: 121 VPRGVANGFQVLSEWSYSYLVNDYWALDLKPKYAFVNYADPSLGITWENLAAAEVSEAD 180 



Query: 181 B 

KNHPLL DVKPLK +DL 
Sbjct: 181 KNHPLLSDVKPLKPKDL 197 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 878 

A DNA sequence (GBSx0931) was identified in S.agalactiae <SEQ ID 2667> which encodes the amino 
acid sequence <SEQ ID 2668>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 .3019 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 879 

A DNA sequence (GBSx0932) was identified in S.agalactiae <SEQ ID 2669> which encodes the amino 

acid sequence <SEQ ID 2670>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
10 i>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 880 

A DNA sequence (GBSx0933) was identified in S.agalactiae <SEQ ID 2671> which encodes the amino 
acid sequence <SEQ ID 2672>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0957 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9367> which encodes amino acid sequence <SEQ ID 9368> 
was also identified. 

The protein is similar to the dTDP-glucose-4,6-dehydratase from S.mutans: 

35 >GP:BAA11249 GB:D78182 dTDP-glucose-4,6-dehydratase [Streptococcus mutans] 

Identities = 290/310 (93%), Positives = 304/310 (97%) 

Query: 1 MTYAGNRANIEAILGDRVELWGDIADAFJjVDKLAAKADAIVH^ SO 
+TYAGN AN+E ILGDRVELWGDIAD+ELVDKLAAl^ADAIVHYAAESHNDNSL DPSPF 
40 Sbjct: 39 LTYAGNHANLEEILGDRVELWGDIADSELTOKIAAKADAIVHYAAESHiroNSLKDPSPF 98 

Query: 61 IHTNFIGTYTLLEAARKYDIRFHHVSTDEVyGDLPLREDLPGNGEGPGEKFTAErKYNPS 120 

I+TNF+GTYTLLEAARKYDIRFHHVSTDEVYGDLPLREDLPG+GEGPGEKFTAETKYNPS 
Sbjct: 99 IYTOFVGTYTLLFJ^KYDIRFHHVSTDEVYGDLPLREDLPGHGEGPGEKFTAETKyNPS 158 

45 

Query: 121 SPYSSTKAASDLIVKAWVRSFGVKATISNCSNNYGPYQHIEKFIPRQITNILAGIKPKLY 180 

SPYSSTKAASDLIVKAWVRSFGVKATISNCSNMYGPYQHIEKFIPRQITNIL+GIKPKLY 
Sbjct: 159 SPYSSTKAASDLIVKAWVRSFGVKATISNCSNNYGPYQHIEKFIPRQITNILSG1KPKLY 218 

50 ' Query: 181 GEGKIWRDWIHTNDHSTGWAILTKGRIGEIYIiIGADGEKNNKEVLEIjILEKMGQPKDAY 240 
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GEGKNVRDWIHTHDHSTGWAILTKGRIGETYLIGPJJCSEKHNKEVLELILEKM QPKDAY 
Sbjct: 219 GEGKNVRDWIHT1©HSTGVWAILTKGRIGETYLIGADGEKNKI<EVLELILEKMSQPKDAY 278 

Query: 241 DHVTDRAGHDLRYAIDSTKLREELC-tffiPQFTNFSEGLEETINWYTENQDWWKAEKEAVEA 300 
5 DHVTDRAGHDLRYAIDSTKLREELGW+PQFTNF EGLE+TI WYTE+ +DWWKAEKEAVEA 

Sbjct: 279 DHVTDRAGHDLRYAIDSTKLREELGWKPOFTNFEEGLEDTIKJJYTEHEDWWKAEKEAVEA 338 



Query: 301 NYAKTQEVIN 310 
NYAKTQ+++N 
10 Sbjct: 339 NYAKTQKILN 348 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2673> which encodes the amino acid 
sequence <SEQ ID 2674>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1150 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 300/309 (97%) , Positives = 303/309 (97%) 

Query: 1 MTYAGNRANIEAILGDRVELWGDIADAELVDKLAAKADAIVHYAAESfflTONSLNDPSPF 60 

+TYAGNRANIEAILGDRVEIJWGDIADAELVDK3JAAK DAIVHYAAESHNDNSL DPSPF 
Sbjct: 37 LTYAGNRANIEAIIiGDRVELWGDIADAELVDKIiAAKTDAIvHYAAESHIIDNSIjEDPSPF 96 

Query: 61 IHTNFIGTYTLLEAARKYDIRFHHVSTDEVYGDLPLREDLPGNGEGPGEKFTAETKYNPE 120 

IHTNFIGTYTLLEAARKYDIRFHHVSTDEVYGDLPLREDIiPG GEGPGEKFTAETKYNPS 
Sbjct: 97 IHTNFIGTYTLLEAARKYDIRFHHVSTDEVYGDLPLREDLPGQGEGPGEKFTAETKYNPS 156 

Query: 121 SPYSSTKAASDLIVKAWVRSFGVKATISNCSNNYGPYQHIEKFIPRQITNILAGIKPKLY 180 

E 

Sbjct: 157 £ 



Query: 241 DHVTDRAGHDLRYAIDSTKLREELGWEPQFTNFSEGLEETINWYTENQDWWKAEKEAVEA 300 
DHVTDRAGHDLRYAIDSTKLREELGWEPQFTNFSEGLEETT WYTEN+ WWKAEK+AVEA 

Query: 301 NYAKTQEVI 309 

YAKTCEVI 
Sbjct: 337 KYAKTQEVI 345 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 881 

A DNA sequence (GBSx0935) was identified in S.agalactiae <SEQ ID 2675> which encodes the amino 
acid sequence <SEQ ID 2676>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
55 >» Seems to have a cleavable N-terra signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 882 

A DNA sequence (GBSx0936) was identified in S.agalactiae <SEQ ID 2677> which encodes the amino 
acid sequence <SEQ ID 2678>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
10 )»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-15.55 Transmembrane 13 - 29 ( 3 - 40) 

Final Results 

bacterial membrane --- Certainty=0 . 7220 (Affirmative) < suco 
15 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 883 

A DNA sequence (GBSx0937) was identified in S.agalactiae <SEQ ID 2679> which encodes the amino 
acid sequence <SEQ ID 2680>. Analysis of this protein sequence reveals the following: 

25 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2882 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 884 

A DNA sequence (GBSx0938) was identified in S.agalactiae <SEQ ID 2681> which encodes the amino 
acid sequence <SEQ ID 2682>. This protein is predicted to be hyaluronate lyase. Analysis of this protein 
40 sequence reveals the following: 

Possible site: 30 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

45 bacterial outside Certainty=0. 3000 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) • 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) ■ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2683> which encodes the amino acid 
sequence <SEQ ID 2684>. Analysis of this protein sequence reveals the following: 



Possible site: 46 

>» Seems to have a cleavable N-term signal seq. 



- Final Results 

bacterial outside - 
bacterial membrane - 
bacterial cytoplasm - 



- Certainty=0. 3000 (Affirmative) < 

- Certainty=0. 0000 (Not Clear) < £ 

- Certainty=0. 0000 (Not Clear) < s 



A related sequence was also identified in GAS <SEQ ID 9099> which encodes the amino acid sequence 
15 <SEQ ID 9100>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 23 
>» Seems to have a cleavable N-term signal seq. 



■ Final Results 

bacterial outside - 
bacterial membrane - 
bacterial cytoplasm - 



•- Certainty= 0.300 (Affirmative) ■ 
•- Certainty= 0.000 (Not Clear) < : 
•- Certainty= 0.000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 353/771 (46%) , Positives = 492/771 (63%) , Gaps = 50/771 (6%) 



Query: 


307 


Sbjct: 


65 




365 


Sbjct: 


118 


Query: 


420 


Sbjct: 


175 




474 


Sbjct: 


235 




532 


Sbjct: 


295 




581 


Sbjct: 


355 


Query: 


641 


Sbjct: 


415 


Query: 


700 


Sbjct: 


475 




760 


Sbjct: 


535 



K K E +A+NI IK 



TIGNHVYDTNDSNM 364 
+D +T+LLD+WN + GN YD + +M 
--QQKDYYTELLDQWNSIIAGNDAYDKTNPDM 117 



I- SA +T TYR + 



A +E LR LR+A+MS E L LK+ IKT++T N FYNV++NLK+Y DI M +LL+ 



GMFYLYN+D HYS ++W TVNPY++ GTTE -I 



Query: 820 ASD--FVGSVKI^HFAIAA^mFTNM3RTDTAQKGWII^KIVFLGSNIKNraGIGNVS 877 
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Query: 878 TTIDQRKDDSKTPYTTYVNGKTVDLKQASSQQFTDTKSVFLESKEPGRNIGYIFFKNSTI 937 

TTI+QRK++ K PY +YVN + VDL FT+TKS+FLES +P +NIGY FFK +T+ 

Sbjct: 649 TTIEQRKENQKYPYCSYVIOTQPVDIiNN-QLVDFTNTKSIFLESDDPAQNIGYYFFKPTTIj 707 

Query: 938 DIERKEQTGTTOSI^TSKNTSI---VSNPFITISQKHDNKGDSYDYMMVPNIDRTSFDK 994 

I + QTG W +1 K+ VSN FITI Q H GD Y YMM+PN+ R F+ 

Sbjct: 708 S I SKALQTGKWQNI KADDKSPEAI KEVSNTF I TIKQNHTQDGDRYAYMMLPNMTRQEFET 767 

Query: 995 LANSKEVELLENSSKQQVIYDKNSQTOAVIKHDNQESLINNQFKMNKAGLY 1045 

+ +++LLEN+ K +YD +SQ VI + + ++ +N ++ G Y 
Sbjct: 768 Y I S KLD I DLLENNDKIAAVYDHDSQQMHVI HYGKKATMFSNH - NLSHQGFY 817 

SEQ ID 2682 (GBS89) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 6 (lane 3; MW 118kDa). 

The His-fusion protein was purified as shown in Figure 190, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 885 

A DNA sequence (GBSx0939) was identified in S.agalactiae <SEQ ID 2685> which encodes the amino 
acid sequence <SEQ ID 2686>. This protein is predicted to be mutator mutt protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3781 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA11250 GB:D78182 MutX [Streptococcus mutans] 
Identities = 132/160 (82%), Positives = 146/160 (90%), Gaps = 1/160 (0%) 

, Query: 1 MTKliATICYIDNGKELLLLHRNKKENDVHEGKWISVGGKLEAGETPDECAKREILEETHL 60 
M KLATICYIDNG+ELLL+HRNKK NDVHEGKWISVGGKLE GE+PDECA+REI EETHL 
Sbjct: 1 MIKrlATICYIDNGRELLL^fflR^^CKPNDvHEGraISVGGKLEKGESPDECARREIFEETHL 60 

Query: 61 WKKMDFKGVITFPEFTPGHDWYTYVFKVTDYEGELISDDESREGTLEWVPYDQVLSKPT 120 

VK+MDFKG+ITFP+FTPGHDWYTYVFKV D+EG LISD +SREGTLEWVPY+QVL+KPT 
Sbjct: 61 IVKQMDFKGI ITFPDFTPGHDWYTYVFKVRDFEGRLI SDKDSREGTLEWVPYNQVLTKPT 120 

Query: 121 WQGDYEIFKWILEDVPFFSAKFVYDEHQNLIEKTVNFYEK 160 

W+GDYEIFKWILED PFFSAKFVY E Q L++K V FYEK 
Sbjct: 121 WEGDYEI FKWILEDAPFFSAKFVYQE - QKLVDKHVIFYEK 159 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2687> which encodes the amino acid 
sequence <SEQ ID 2688>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3399 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/158 (82%) , Positives = 146/158 (91%) 

Query: 1 MTKIATICTIDNGKELLLLHRNKKENDVHEGrailSVGGKLEAGETPDECAKREILEETHL 50 

MT+LATICYIDNG LLLLHRNKKENDVH+GKWISVGGKLEAGETPDECA+REILEETHL 
Sbjct: 1 MTQLAT1CYIDNGDSLLLLHRNKKENDVHKGKM-SVGGKLEAGETPDECARRE1LEETHL SO 

Query: 61 TVKKMDFKGVITFPEFTPGHDJTYTYVFKVTDyEGELISDDESREGTLEWVPYDQVLSKPT 120 

TV +M FKG+ITFPEFTPGHDWYTYVFKVT +EG+LISD+ESREGTLEWVPYDQVL KPT 
Sbjct: 61 TVTEMAFKGIITFPEFTPGHDI'JYTYVFKVTGFEGDLISDEESREGTLEWVPYDQVIEKPT 120 

Query: 121 WQGDYE I FKW I LED VP F FS AKFVYDEHQNL I EKTVNF Y 158 

W+GDY+1FKWILED FFSAKF YD++ L++K+V FY 
Sbjct: 121 WEGDYDIFKWILEDRSFFSAKFTYDQNNQLMDKSVTFY 158 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 886 

A DNA sequence (GBSx0940) was identified in S.agalactiae <SEQ ID 2689> which encodes the amino 
acid sequence <SEQ ID 2690>. This protein is predicted to be MutT/nudix family protein. Analysis of this 
protein sequence reveals the following: 

o N- terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 . 1901 (Affirmative) < succ 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 22 FGVRVSALIIENQKLLLIYAPHLDKYY-LPGGALQVGEDSNKAVAREVLEEIGLHSQVGD 80 

F R + + +++ 4LL + ++ LPGGA+Q GE S A RE EE GL + V 

Sbjct: 33 FQTRATLICTQDNRLLTCMDERFPDFFALPGGAVQTGESSAAAAQREWHEETGLRADVTR 92 

Query: 81 MYIIENQFNIKKHHYHSVEFLYFVNLLGQAPESIK3GTHKRHFVWLPIKELTKIDCNPN 140 

A+EF++ H F+VLG+P+++H FWL+L P 
Sbjct: 93 CA-TLERFFHWEGRERHEFGFFFRVELTGELPATVLDNPHV-FFRWLAVDALDDHTLYPR 150 

Query: 141 FLAQDL1EWPGHWH 155 

+ Q L G + H 
Sbjct: 151 CVPQLLRLPAGEIGH 165 

A related DNA sequence was identified in S.pyogenes <SEQ ID 269 1> which encodes the amino acid 
sequence <SEQ ID 2692>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 3832 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 33/80 (41%), Positives = 50/80 (62%), Gaps = 1/80 (1%) 

Query: 29 LIIENQKLLLIYAPHLDKYYLPGGALQVGEDSNKAVAREVLEEIGLHSQVGDLAYIIENQ 88 

LI+ N K L D+YY GG VGE ++4 V RE LEE+G+ ++V LA+++EN 

Sbjct: 1 LIVRNGKNFLTPJ3AD-DQYYTIGGTSLVGEKTHETTORETLEEVGIRRKVNQLAFMVEMH 59 

Query: 89 FNIKRHHYHSVEFLYFVNLL 108 

F+I +H++EF Y V+ L 
Sbjct: 60 FDIDDVFWHNIEFHYLVSPL 79 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d: 



Example 887 

A DNA sequence (GBSx0941) was identified in S.agalactiae <SEQ ID 2693> which encodes the amino 
15 acid sequence <SEQ ID 2694>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 



Possible site: 26 



>> Seems to 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



3 N- terminal signal sequence 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



= -1 



Transmembrane 294 - 31( 

Transmembrane 242 - 25S 

Transmembrane 50 - 6f 

Transmembrane 337 - 35," 



269 - 285 



- Final Results 

bacterial membrane Certainty=0 . 6180 (Af f irmat 

bacterial outside --- Certainty-0 . 0000 (Not Cles 

bacterial cytoplasm Certainty=0 . 0000 (Not Cles 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2695> which encodes the amino acid 
sequence <SEQ ID 2696>. Analysis of this protein sequence reveals the following: 



Possible site: 26 
•> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = -1.59 




88 - 104 ( B5 ■ 

24 - 40 ( 21 ■ 

47 - 63 ( 41 - 

243 - 259 ( 237 • 

181 - 197 ( 178 ■ 

278 - 294 ( 273 • 



Transmembrane 297 - 



266) 
203) 
310) 
368) 
314) 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4885 (Affirmative) 

- Certainty=0. 0000 (Not Clear) < , 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the 

>GP:AAD00285 GB:U78604 putative membrane protein [Streptococcus mutans] 
Identities = 244/382 (63%) , Positives = 310/382 (80%) , Gaps = 3/382 (0%) 

Query: 12 SLFYKWFLNNQATMALVITLLAFLTIFVFTKISFLFMPVISFFAVIMLPLVISTILYYLT 71 

S F+KWFL+N+ L++ LL FL I VFTKIS +F P++SF AVIMLPLVIS +LYYL 
Sbjct: 17 SMFFKWFLDNKTVTVLLVLLLVFLDILVFTKISSrFKPLLSFLAVIMLPLVISALLYYLL 76 

Query: 72 KPLVDLINHLGPNRTTSIFIVFGLITLLFWAISGF/PWQTQLTSFIEDLPKYVGKVNE 131 
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Query: 132 EANKLLElffiWLVSYKPQLQDMLTHTSC'KMjDYAQSFSKNAIDWAGNFAGAIARITVAIII 191 

+ +KLL N+ L S++PQ+++ +T+ SQKA+DYA+ FSK A+ WAGNFA IAR+TVAIII 
Sbjct: 137 QVSKLLPJroLIASFRPQIENA\7TNFSQKA\TDYAEPFSKGAVTWAGNFASLIARVTVAIII 196 

Query: 192 SPFILFYFLRDSSHMKNGLVWLPLKLRVPMVRVLGD1NKQLSGYVQGQVTVAIWGFMF 251 

SPF1+FY LRDSS MK V+ LP K+R P+ R+LGD+N+QL+GYVQ TVAI+VGFMF 
Sbjct: 197 SPFIVFYLIiRDSSKMKEAFVSYLPT^QPIHRILGDVNRQLAGYVQRSSTVAIIVGFMF 256 

Query: 252 SIMFSLVGLKYAITFGIIAGFIJ^IPYLGSFLAMIPVVINmVQGPFMLVKVLVIFMIEQ 311 

SIMF+++GL+YA+TFGIIAGFLNMIPYLGSFLA IPV I+A+V+GP +VKV ++F++EQ 
Sbjct: 257 SIMFTIIGLRYAVTFGIIAGFIjNMIPYLGSFIjATIPWILALVEGPVKVVKVALVFIVEQ 316 

Query: 312 TIEGRFVAPLVLGNKLSIHPITIMFLLLTAGSMFGVWGVFLVIPIYASVKWIKELFDWY 371 

TIEGRFV+PLVLG+KLSIHPITIMF+LLTAGSMFGVWGVFL ip+yas+kw+ke+f+wy 
Sbjct: 317 TIEGRFVSPLVLGSKLSIHPITIMFILLTAGSMFGTOGVFLGIPVYASIKVWKEIFEWY 376 

Query: 372 IOCVSGLYDEEVLV1EEVKDHVK 393 

K +SGLY++E E++K VK ' 
Sbjct: 377 KPISGLYEKEE EDIKKDVK 395 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 243/389 (62%), Positives = 306/389 (78%), Gaps = 2/389 (0%) 

EKEFKNSLFFKWILNNQAVIALMITFLVFLTIFIFTKISFMFKPVFDFLAVLILPLVISG 65 
EK +SLF+KW LNNQA +AL+IT L FLTIF+FTKISF+F PV F AV++LPLVIS 
EKSRTDSLFYKWFLNMQATMALVITLLAFLTIFVFTKISFLFMPVISFFAV1MLPLVIST 65 



++ M WL SYK ++ ML++ S +A+ YA+ FSKN +DWAGN A +AR 



+TVA I++PFILFY LRDS +MKNG + VLP KLR P R+L ++N Q+SGYVQGQ+ VA 



I VG +FSIM+S++GL+Y +T GIIAG LNM+PYLGSP+A IPV I+A+V GP M+VKV 



++F+IEQT+EGRFV+PLVLGNKLSIHPITIMF+LLT+G+MFGVWGVFL IPIYAS+KW+ 
VIFMIEQTIEGRFVAPLVLGNKLSIHPITIMFLLLTAGSMFGVWGVFLVIPIYASVKVVI 364 

KELFDWYKAVSGLYTVDV-VTEERSEEVK 393 
KELFDWYK VSGLY +V V EE + VK 
KELFDWYKKVSGLYDEEVLVIEEVKDHVK 393 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 888 

A DNA sequence (GBSx0942) was identified in S.agalactiae <SEQ ID 2697> which encodes the amino 
acid sequence <SEQ ID 2698>. Analysis of this protein sequence reveals the following: 





6 
6 


Sb j Ct : 
Query: 


66 


Sbjct: 




Query: 


126 


Sbjct: 


126 




186 


Sb j ct : 


185 


Query: 


246 


Sbjct: 


245 




306 


Sbjct: 


305 




366 


Sbjct: 


365 



3 N-terminal signal 
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Final Results 

bacterial cytoplasm --- Certainty=0. 2715 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9891> which encodes amino acid sequence <SEQ ID 9892> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 79 INLAQIVAEDGDIEQAFLYLDYISEDSQEYVSALLWIADLYDMEGLTDVAREKLLLASKL 138 

+NLA+I ++G++++A YL I 4 4 Y++AL+ +ADLY E + A KL A +L 
Sbjct: 1 VNLAEIAEDNGNLDEALNYLYQI PVNDENYIAALI KIADLYQFEVDFETAI SKLEEAREL 60 

Query: 139 SDDPLVTFGLAEMNLSLEHYQEAIEGYASLDNREILETTGVSTYQRIGKSYAIMGKFDAA. 198 

SD PL+TF LAE Y AI YA Ii R+IL T +S YQRIG SYA +G F+ A 

Sbjct: 61 SDSPLITFALAESYFEQGDYSAAITEYAiOjSERKILHETKISIYQRIGDSYAQLGNFENA 120 

Query: 199 IEFLEKAVDIEYDDLTVFELATILYDQEEYQKANLYFKQLDTINPDFAGYEYIYGLSLRE 258 

I FLEK444 + T4444A + + +A FK+L+ 44 +F YE Y +L 

Sbjct: 121 ISFLEKSLEFDEKPETLYKIALLYGETHNETRAIANFKRLEKMDVEFM^IAYAQTLFA 180 

Query: 259 EHKSEEALRLVQQG1RKNSFDGQLLLIASQLSYELHDVHSSESYLKQAEKVSENQDE1VM 318 

+ + AL + ++G++KN LL AS++ +4L D ++E YL A + E DE V 

Sbjct: 181 NQEFKAALEMAKKGMKKNPNAVPLLHFASKICFKLKDKAAAERYLVDALNLPELHDETVF 240 

Query: 319 RLSNLYLEEERFEE VLELDN - DNLENILAKWNIAKAHKALEMDDSVD - - YYQSLYNDLKD 375 

L+NLY EE FE V+ L+ E++LAKW A AHKALE D Y + + +L 4 

Sbjct: 241 LIANLYFNEEDFEAVINLEELLEDEHLLRKWLFAGAHKALENDSFAAALYEELIQTNLSE 300 

Query: 376 NPEFLQDYAYILREFGYIX1KAQEVGKAYLKLVPDDIEMSEWVNNI 420 

NPEFL+DY L+E G + K 4 + + YL4LVPDD M 4 44 
Sbjct: 301 NPEFLEDYIDFLIQ3IGQISKTEPIIEQYLELVPDDENMRNLLTDL 345 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2699> which encodes the amino acid 
sequence <SEQ ID 2700>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2991 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 267/409 (65%), Positives = 336/409 (81%), Gaps = 1/409 (0%) 

Query: 13 MLNSEEOTIVSIQNQDIjEHANKYFEKAIjKNDPEEVIjLEIjGAYLESIGFLPQAKRLYDQIRP 72 

MLNSEKMI S4 QDL HA KYF4KALK D 4 L4 LG YLESIGFLP AKR4Y Q4 
Sbjct: 7 MMSEKMlASLDQQDUmAEKYFQKALKEDDADSLIALGEYLESIGFLPHAKRIYLQLAD 66 

Query: 73 NYPEVAINLRQIVAEDGDIEQAFLYLDYISEDSQEYVSALLVMADLYDMEGLTDVAREKL 132 

4YPE4 INLAQI AED IE4AFLYLD 4S4DS Y4SALLVMADLYDMEGLT4VAREKL 
Sbjct: 67 DYPEIjNIlSn^QIAAEDDAIEFAFLYLDKVSKDSPNYLSALLVMADLYDMEGLTEVAREKL 126 

Query: 133 LIASKLSDDPLVTFGLAEMNLSLEHYQEAIEGYASLDNREILETTGVSTYQRIGKSYAIM 192 

LA 4S 4PLV FGLAE444SL+H44EAI4 YA LDNR4ILE TG4STYQRIG44YA 4 
Sbjct: 127 LQAVGISPEPLVIFGIiftEIDMSLQHFKEAIDYYAQLDNRQILELTGISTYQRIGRAYASL 186 
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Sbjct: 187 CSKFE^IEFLEKAVAIEYEDETVFELATLMYDQENYQKANLYFKQLETINPDYPGYEYGY 246 

Query: 253 GLSLREEHKSEEALRLVQQGIRKNSFDGQLLLIASQLSYELHDVHSSESYLKQAEKVSEN 312 

LSL EEHK+ EALRLVQQG+RKN+FD QLLLLASQLSYELHD ++E+YL QA++V+ + 
Sbjct: 247 ALSLHEEHKTSEALRLVQQGLRKNAFDSQLLLLASQLSYEI.HDRQNAENYLLQAKEVAVD 306 

Query: 313 QDEIVmLSHLYLEEERFEEVLELDITOltt^^ 371 

+EI+MRL LY + ERFEEV+ L+ + ++N+L KW IAKA+ ALE 4+ ++ Y + 
Sbjct: 307 DEEIIJ^LVTLYFDAERFEEVIALMRETlDNW^TKWTIAKAYHALEQEEVALALYiroiSA 366 

Query: 3 72 DLKDNPEFLQDYAYrLREFGYLDKAQEVGKAYLKLVPDDIEMSEWVMNI 420 

DL +NPEFLQDYAY+LREFG KA ++ AYL+ VPDD+ M 
Sbjct: 367 DLAENPEFLQDYAYLLREFGQFHKAIQMATAYLRQVPDDVNMQDFLDHI 415 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 889 

A DNA sequence (GBSx0943) was identified in S.agalactiae <SEQ ID 2701> which encodes the amino 
acid sequence <SEQ ID 2702>. This protein is predicted to be alpha-acetolactate synthase (ilvK). Analysis 
of this protein sequence reveals the following: 



;erminal signal i 



Final Results 

bacterial cytoplasm --- Certalnty=0 . 2105 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA01700 GB:A23961 alpha-acetolactate synthase [Lactococcus 

Identities = 396/559 (70%) , Positives = 466/559 (82%) , Gaps = 8/559 (1%) 

Query: 4 SHNQYGADLIVDSLINHDVKYVFGIPGAKIDRVFDTLE-DKGPELIVARHEQNATFMAQA 62 

S Q+GA+L+VDSLINH VKYVFGI PGAKIDRVFD LE ++GP+++V RHEQ A FMAQA 
Sbjct: 2 SEKQFGAra I WDSLINHKVKYVFGIPGAKIDRVFDLLEl^EGPQI«rm'RHEQGAAFMAQA 61 

Query: 63 VGRITGEPGWIATSGPGISNr^TGLVTATDEGDAVIAIGGQVKRGDLLKRAHQSMNNVA 122 

VGR+TGEPGW+ TSGPG+SNLAT L+TAT EGDA+LAIGGQVKR D LKRAHQSM+N 
Sbjct: 62 VGRLTGEPGVVVVTSGPGVSNLATPLLTATSEGDAILAIGGQVKRSDRLKRAHQSMDNAG 121 

Query: 123 MLEPITICYSAEVHDPNTLSETVANAYRLAKSGKPGAS F I S I PQDVTDSPVSVKAI KPLSA 182 

M++ TKYSAEV DPNTLSE++ANAYR+AKSG PGA+F+SIPQDVTD+ VS+KAI+PLS 
Sbjct: 122 MMQSATKYSAE VLDPNTLSES I ANAYRI AKSGHPGATFLS I PQDVTDAEVS I KAI QPLSD 181 

Query: 183 PKLGSASVLDINYIAQAINNAVLPVLL^^ 242 

PK+G+AS+ DINYIAQAI NAVLPV+L+G GAS V +-H-R LL V +PWETFQGAG 
Sbjct: 182 PKMGNASIDDINYLRQAIKNAVLPVILVGAGASDAKVASSLRNLLTHVNIPVVETFQGAG 241 

Query: 243 IVSRELEDETFFGRVGIiFRNQPGDMLliKRADLVIAIGYDPIEYEARNWNAEISARIIVID 302 

++S +LE TF+GR+GLFRNQPGDMLLKR+DLVIA+GYDPIEYEARNWNAEI +RIIVID 
Sbjct: 242 VISHDLE-HTFYGRIGLFRNQPGDMLLKRSDLVIAVGYDPIEYEARNWNAEIDSRIIV1D 300 

Query: 303 VEQAEIDTYFQPEREMGDMAHTLDLLLPAIKGYELPEGSKEYLKGLRNNIENVSDVKFD 362 

AEIDTY+QPERELIGD+A TLD LLPA++GY++P+G+K+YL GL E +FD 
Sbjct: 301 NAIAEIDTYYQPERELIGDIAATLDNLLPAVRGYKIPKGTKDYLDGLH EVAEQHEFD 357 

Query: 363 RDSA-HGLvHPLDLIDVXQENTTDDMTVTTOVGSHYIWMftRYFKSYEARHLLFSNGMQTL 421 

++ G +HPLDL+ QE DD TVTVDVGS YIWMAR+FKSYE RHLLFSNGMQTL 
Sbjct: 358 TENTEEGRiVIHPIiDLVSTFQEIvTQ3DETVTVDVGSLYIM»IARHFKSYEPRHLIjFSNGMQTL 417 
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Query: 422 GVALPWAISAALLRPNTKvT SVSGDGGFLFSAQELETAVRLHLPIVHI IWNDGKYNMVEF 481 

GVALPWAI +AALLRP KV S SGDGGFLF+ QELETAVRL+LPIV IIWNDG Y+MV+F 
Sbjct: 418 GVALPMAITAALLRPGKKVYSHSGDGGFIjFTGQELETAWIiNI.PIVQIIWM)GHYDMVKF 477 

Query: 482 QEEMKYGRSSGVDFGPVDFVKYAESFGAKGYRVDSKDSFEETLKQALIDAEKGPVLIDVP 541 

QEEMKYGRS+ VDFG VD+VKYAE+ AKGYR SK+ E LK I GPV+IDVP 
Sbjct: 478 QEEMKyGRSAAVDFGYVDYVKYAEAMRAKGYRAHSKEELAEILKS--IPDTTGPWIDVP 535 

Query: 542 IDYKDNVTLGETILPDEFY 560 

+DY DN+ L E +LP+EFY 
Sbjct: 536 LDYSDNIKLAEKLLPEEFY 554 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 890 

A DNA sequence (GBSx0944) was identified in S.agalactiae <SEQ ID 2703> which encodes the amino 
acid sequence <SEQ ID 2704>. This protein is predicted to be alpha-acetolactate decarboxylase (aldC). 
Analysis of this protein sequence reveals the following: 

D N- terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 3095 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9889> which encodes amino acid sequence <SEQ ID 9890> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:CAA57941 GB:X82620 alpha-acetolactate decarboxylase [Lactococcus 
lactis] 

Identities = 139/239 (58%), Positives = 187/239 (78%), Gaps = 3/239 (1%) 

Query: 16 MSETVKLFQYSTLSSL^GLYKGSLTIGELLTHGDLGIGTVHMIDGELIVLDGKAYQAIG 75 

MSE +LFQY+TL +LMAGLY+G++TIGELL HGDLGIGT+ IDGELIVLDGKAYQA 
Sbjct: 1 MSEITQLFQYNTLGALMAGLYEGTMTIGELLKHGDLGIGTLDSIDGELIVLDGKAYQA-- 58 

Query: 76 TDGKAEIIQLSDDVTVPYAAVLPHHIQKQFDINAEIDWKDLEEMILKNFEGQNLFKSLKI 135 

G I++L+DD+ VPYAAV+PH + F + +K+LE+ I F+GQNLF+S+KI 

Sbjct: 59 -KGDKTIVELTDDIKVPYAAWPHQAEWFKQKFTVSDKSLEDRIESYFDGQNLFRSIKI 117 



Query: 136 E 

G F +MHVRMIP++ +F +++ NQPE+T EN+ GT+VGIWTPE+FHGV V G+H+H 
Sbjct: 118 TGKFPKMHVRMIPRAKSGTKFVEVSQNQPEYTEENIKGTIVG1WTPEMFHGVSVAGYHLH 177 

Query: 196 FISDDLTFGGHVMDYSLTQGKVEIGKVDQIJIlQCFPTQDQEFLKANFDliQKLREDIDLSE 254 

FIS+D TFGGHV+D+ + G VEIG +DQL+Q FP QD++FL A+ C++ L++DID++E 
Sbjct: 178 FISEDFTFGGHVLDFIIDNGTVEIGAIDQLNQSFPVQDRKFLFADLD1EALKKDIDVAE 236 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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Example 891 

A DNA sequence (GBSx0945) was identified in S.agalactiae <SEQ ID 2705> which encodes the amino 
acid sequence <SEQ ID 2706>. This protein is predicted to be fibronectin-binding protein-like protein A. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 57 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5042 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA46282 GB:X65164 fibronectin-binding protein-like protein A 
15 [Streptococcus gordonii] 

Identities = 392/550 (71%) , Positives = 462/550 (83%) 

Query: 1 MSFDGFFLHHLTNELQEQIEKGRIQKVNQPFDHELVLTIRNNRRNYKLLLSAHPVFGRIQ 60 
MSFDGFFLHH+T EL+ ++ GRIQK+NQPF+ ELVL IR+NR++ KLLLSAH VFGR+Q 
20 Sbjct: 1 MSFDGFFLHHMTEELRHELVGGRIQKINQPFEQELVLQIRSNRKSLKLLLSAHSVFGRVQ 60 



Query: 61 TTEANFQNPQNPNTFTMIMRKYLQGAVIETlQQIEimRILEIWSNKNEIGDHIKATLW 120 

T+ F+NP PNTF M+MRKYLQGAVTE IQQ+ENDRILEI VSNKNEIGD + TT.V+ 
Sbjct: 61 LTDTTFENPAVPNTFIMVMRKYLQGAVIEAIQQVENDRILEISVSNKNEIGDSVAVTLVI 120 

Query: 121 EIMGKHSNI ILIDKNEHKI IES IKWGFSQNSYRTILPGSTYIAPPKTKAINPFDISDQT 180 

EIMGKHSNIIL+DK KIIE+IKHVGFSQNSYRTILPGSTy+APP+T ++NPF + D+ 
Sbjct: 121 EIMGKHSNI ILLDKASGKIIEAIKHVGFSQNSYRTILPGSTYVAPPQTGSLNPFTVGDEK 180 

Query: 181 LFELLQTNDLSPKNLQQLLQ^I/SRDTALELSHCLKDNKLNDFRQFFSREYYPSLTEKSFS 240 

LFE+LQT ++ PK L Q+ QGLGRDTA ELS L Hi FR FF+ PSLTEKSFS 
Sbjct: 181 LFEILQTEEIEPKRLLQIFQGLGRDTATELSGRLTTDRLKTFRAFFASPTQPSLTEKSFS 240 

Query: 241 AVQFSSSHETFQSLGQLLDYyYQEKAEKDRIAQQASDLIHRVQSELEKNIKKLAKQQDEL 300 

A+ FS S +L +LLD +Y++KAE+ R+ QQAS+LI RV++ELEKN KKL KQ+DEL 

Sbjct: 241 ALVFSDSKTQMSTLSELLDTFYKDKAERYRWQQASELIRRVENELEKNRKKLGKQEDEL 300 



Query: 301 IATENAEEFRQKGELLTTYLSMVPNNQDVWLDNYYTNQTIEISLDRALTPNQNAQRYFK 360 
LATE AEEFRQKGEIiLTT+L VPN+QD V LDNYYT ■)■ I I+LD+ALTPNQNAQRYFK 
40 Sbjct: 301 LATEKAEEFRQKGELLTTFLHQVPNDQDQVELDNYYTGEKILITLDKALTPNQNAQRYFK 360 







KYQKLKEAVKHLKGIISDTENTITYLESVETSLNHASMEDINDIREELVETGFIKRRAHD 








+YQKLKEAVKHL +1 +T TI YDESVET+L AS+ +1 + IREEL+ +TGFI +RR + 




Sbjct: 


361 


RYQKLKEAVKHLTSLIEETRTTILYLESVETALAQASLTEIAEIREELIQTGFIRRRQRE 


420 




421 


KQHKRKKPEQYLASDGKTIIWGRNNLQNDELTFKMARKGELWFHAKDIPGSHVLIRDNL 


430 






K KRKKPE+YLASDG+TI I +VGRNNLQNDELTFKMA+K ELWFHAKDIPGSHV+I NL 




Sbjct: 


421 


KIQKRKKPEKYIASDGQTIILVGRNNLQNEELTFIOTAKroELWFHAKDIPGSHWITGNL 


480 


Query: 


481 


NPSDEVKTDAAEIAAYYSKARLSNLVQVDMIFJIKKI^KPSGTKPGFVTYTGQKTLRVTPT 


540 






PSDEVKTDAAEIiAAY+SKARLSNLVQVDMIE KKLNKP+G KFGFVTYTGQKTLRVTP 




Sbjct: 


481 


QPSDEVKTDAAELAAYFSKARLSNLVQVDKIEIKKLHKPTGGKPGFVTYTGQKTLRVTPD 


540 




541 


QEKIDSLKLK 550 








+KI S+K++ 




Sbjct: 


541 


ADKIKSMKIQ 550 





A related DNA sequence was identified in S.pyogenes <SEQ ID 2707> which encodes the amino acid 
sequence <SEQ ID 2708>. Analysis of this protein sequence reveals the following: 



>> Seems to have no N-terminal signal sequence 



WO 02/34771 



-982- 



PCT/GB01/04789 



Final Results 

bacterial cytoplasm Certainty=0 . 5434 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein differs significantly from L28919 in its mid-region: 

Query: 223 QHFQGLGRDTAKELAELLTTD 
F L +T K + ELLTTD 
Sbjct: 121 PAFSRLRGETPKRIGELLTTD 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 421/549 (76%) , Positives = 487/549 (88%) 

Query: 1 MSFDGFFI^HLTNELQEQIEKGRIQKOTQPFDHELVLTIRHNRRNYKLLLSAHPVFGRIQ 60 

MSFDGFFLHHLTNEL+E + GRIQKVNQPF-h ELVLTIRN+R+NYKLLLSAHPVFGR+Q 
Sbjct: 27 MSFDGFFLHHLTNELKENLLYGRIQKA'NQPFEREL'VLTIRNHRKNYKLLLSAHPVFGRVQ 86 

Query: 61 TTEANFQNPQNPNTFTMIMRKYLQGAVIETIQQIENDRILEIWSNKNEIGDHIKATLW 120 

T+A+FQNPQ PNTFTMIMRKYLQGAVIE ++QI+NDRI+EI VSNKNEIGD I+ATL++ 
Sbjct: 87 ITQADFQNPQVPNTFTMIMRKYLQGAVIEQLEQIDNDRIIEIKVSNKNEIGDAIQATLII 146 

Query: 121 EIMGKHSNIILIDKNEHKIIESIKHVGFSQNSYRTILPGSTYIAPPKTKAINPFDISDQT 180 

EIMGKHSN1 IL+D+ E+KIIESIKHVGFSQNSYRTILPGSTYI PPKT A+WPF I+D 
Sbjct: 147 EIMGKHSNIILVDRAEWKriESIKHVGFSQNSyRTILPGSTYIEPPKTAAVNPFTITDVP 206 

Query: 131 LFELLQTNDLSPKMiQQLLQGLGRDTALELSHCLKDNKLNDFRQFFSREYYPSLTEKSFS 240 

LFE+LQT +L+ K+LQQ QGLGRDTA EL+ L +KL FR+FF+R +LT SF+ 
Sbjct: 207 LFEILQTQELTVKSLQQHFQGI/SRDTAKEIAELLTTDKLKRFREFFARPTCANLTTASFA 266 

Query: 241 AVQFSSSHETFQSLGQLLDYYYQEKAEKDRIAQQASDLIHRVQSELEKNIKKIAKQQDEL 300 

V FS SH TF++L +LD++YQ+KAE+DRI QQASDLIHRVQ+EL+KN KL+KQ+ EL 
Sbjct: 267 PVLFSDSHATFETLSDMLDHFYQDKAERDRINQQASDLIHRVQTELDKNRNKLSKQEAEL 326 

Query: 301 IATENAEEFRQKGELLTTYLSMVPIWQDWVIjDKrYYTNQTIEISIjDRALTPNQNAQRyFK 350 

LATENAE FRQKGELLTTYLS+VPNNQD V+LDNYYT + IEI+LD+ALTPNQNAQRYFK 
Sbjct: 327 LATENAELFRQKGELLTTYLSLVPNNQDSVILDNYYTGEKIEIALDKALTPNQNAQRYFK 386 



Query: 481 NPSDEVKTDAAELAAYYSKARLSNLVQVDMIEAKKI^KPSGTKPGFVTYTGQKTLRVTPT 540 

+PSDEVKTDAAELAAYYSKARLSNLVQVDMIEAKKL+KPSG KPGFVTYTGQKTLRVTP 
Sbjct: 507 DPSDEVKTDAREIiAAYYSKARLSNLVQVDMIEAKJXHKPSGAKPGFVTYTGQKTLRVTPD 566 

Query: 541 QEKIDSLKL 549 

Q KI S+KL 
Sbjct: 567 QAKILSMKL 575 

SEQ ID 2706 (GBS81) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 6 (lane 2; MW 64kDa) and in Figure 6 (lane 5; MW 64kDa). The GBS81-His 
fusion product was purified (Figure 190, lane 3) and used to immunise mice. The resulting antiserum was 
used for FACS (Figure 319), which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-983- 

Example 892 

A DNA sequence (GBSx0946) was identified in S.agalactiae <SEQ ID 2709> which encodes the amino 
acid sequence <SEQ ID 2710>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
5 ■>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.08 Transmembrane 6 - 22 ( 1 - 24) 

Pinal Results 

bacterial membrane Certainty=0 .4630 (Affirmative) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF94260 GB:AE004191 conserved hypothetical protein [Vibrio cholerae] 
15 Identities = 111/295 (37%) , Positives = 184/295 (61%) , Gaps = 1/295 (0%) 



Query: 36 QWKIGILQYOTHDALDAIEKGVEDGIAQEGYX-GKKVKLTVLNAEADQSKIQAMSKQLV 94 

+ K+ + Q V H ALDA +G+ DGL +GY+ GK ++ A+ + + +++Q V 

Sbjct: 26 KTAKVAVSQIVEHPALDATRQGLLDGLKAKGYSEGKNLEFDYKTAQGNPAIAVQIARQFV 85 

Query: 95 NHHNDILIGIATPSAQGIAASTKDTPIIMGAVSDPLGAKLVTNMKKPTTNVTGLSNVVPT 154 

+ D+L+GIATP+AQ L ++TK PI+ AV+DP+GAKLV 4-++P NVTGLS++ P 
Sbjct: 86 GENPDVLVGIATPTAQALVSATKTIPIVFTAVTDPVGAKLVKQLEQPGKNVTGLSDLSPV 145 

Query: 155 kqtvqlikditpnikrigilyassednsvsqvteftkyaqkaglevlkysvpstneikts 214 

+Q V+LIK+I PN+K IG++Y E N+VS + A K G+++++ + + +++++ 

Sbjct: 146 EQHVELIKEILPOTKSIGWYNPGERNAVSLMEIJ^SAAKHGIKLVEATALKSADVQSA 205 

Query: 215 MSVMTKKVDAVEVPQDNTIASAFRTVIVAANOANIPVYSSVDTMVEQGSIASVAQSQYGL 274 

+ +K D ++ DNT+ASA +IVAANQA PV+ + + VE+G+IAS+ Y + 
Sbjct: 206 TQAIAEKBDVIYALIDNTVASAIEGMIVAANQAKTPVFGAATSYvERGAIASLGFDYYQI 265 

Query: 275 GLETAKQAIKVLRGKPVKDVPVKVIDTGKPSLNLKAAKHLGIKIPKKIMKQAEIT 329 

G++TA +L GK + V+V +N AR.+ LGI IP+ ++ +A T 

Sbjct: 266 GVQTADYVAAILEGKEPGSLDVQVAKGSDLVINKTAAEQLGITIPEAVLARATST 320 

A related DNA sequence was identified in S. pyogenes <SEQ ID 271 1> which encodes the amino acid 
sequence <SEQ ID 2712>. Analysis of this protein sequence reveals the following: 

Possible site: 23 



Final Results 

45 bacterial membrane 

bacterial outside 
bacterial cytoplasm 



• — Certainty=0. 5501 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 
■-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

50 >GP:AAF94260 GB:AE004191 conserved hypothetical protein [Vibrio cholerae] 

Identities = 103/304 (33%) , Positives = 178/304 (57%) , Gaps = 1/304 (0%) 

Query: 17 VIGSLLSKGVSKENRDLANQQNITIGILQFVTHEALDD1KRGIEDQLK-KQMPQKQNW1 75 
VI + +G++ + + + QVH ALD ++G+ D LK K + +N+ 

55 Sbjct: 6 VIATAVrAGAALLSSQSIMAKTAKVAVSQIVEHEALDATRQGLLDGLKAKGYEEGKNLEF 65 



Query: 76 KVMNAEGDQSKIQTMSRQLVQSGSDIVIGIATPAAQGLAATSKDIPVVMSAVSDPVGSRL 135 

A+G+ + ++RQ V D+++GIATP AQ L + +K IP+V +AV+DPVG++L 

Sbjct: 66 DYKTAQGNPAIAVQIARQFVGENPDVLVGIATPTAQALVSATKTIPIVFTAVTDPVGAKL 125 

Query: 136 WQLDQPEANVTGLSNKVPWQTIDLMKKLTPHVKTVGILYASl^DNSLSQVKEFRRLAR 195 
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V QL+QP NVTGLS+ PV+Q ++L+K++ P+VK++G++Y E N4+S ++ + A 
Sbjct: 126 VKQLEQPGKNVTGLSDLSPVEQHVELIKEILPNVKSIGWYNPGEMAVSLl^LLKLSAA 185 

Query: 196 KKGYQVISYAVPSTNEVPATMSVMLGKVDAVFIPQDMTIASAFSSVMTTSKAAKIPVYTS 255 
5 K G +++ + +V + + K D ++ DNT+ASA ++ + AK PV+ + 

Sbjct: 186 KHGIKLVEATALKSADVQSATOAIAEKSDVIYALinNTVASAIEGMIVAANQAKTPVFGA 245 

Query: 256 VDRMVEKGGLAAISQNQYDLGVQTANQVLI^IKGKRVVDVPVKWDIGQPLINKNVAAEL 315 
VE+G +A++ + Y +GVQTA+ V +++GK + V+V +INK A +L 

10 Sbjct: 246 ATSYVERGAIASLGFDYYQIGVQTADYVAAILEGKEPGSLDVQVAKGSDLVINKTAAEQL 305 

Query: 316 GIAI 319 

Sbjct: 306 GITI 309 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 181/322 (56%) , Positives = 252/322 (78%) , Gaps = 1/322 (0%) 

Query: 1 MKNKGLIATLILLTILWGELFYNK-SSICRLNLSEKQWKIGILQYVTHDALDAIEKGVE 59 
20 MKHK LIATL++LT++V+G L S++ +L+ +Q + IGILQ+VTH+ALD I++G+E 

Sbjct: 1 MKNKSLIATLLVLTVIVIGSLLSKGVSKEMRDLANQQNITIGrLQFVTHEALDDIKRGIE 60 

Query: 60 DGLAQEGYKGKKVKLTVLNAEADQSKIQAMSKQLVNHHHDILIGIATPSAQGIAASTKDT 119 
D L ++ + + V + V+NAE DQSKIQ MS+QLV +DI+IGIATP+AQGLAA++KD 
25 Sbjct: 61 DQLKKQMPQKQNVVIKVMNAEGDQSKIQTMSRQLVQSGSDIVIGIATPAAQGLAATSKDI 120 

Query: 120 PIIMGAVSDPLGAKLVTNMKKPTTim'GLSmWPTKQTVQLIKDITPNIKRIGILYASSE 179 

P++M AVSDP+G++LV + +P NVTGLSN VP KQT+ L+K +TP++K +GILYAS+E 
Sbjct: 121 PVVMSAVSDPVGSRLVI'IQLDQPEANVTGnSNKVPVKQ'riDLMKKLTPHVKTVGILYASNE 180 

30 

Query: 180 DNSVSQVTEFTKYAQKAGLEVLK^SVPSWffilKTSMSVMTKKVDAVFVPQDNTIASAFRT 239 

DNS+SQV EF + A+K G +V+ Y+VPSTNE+ +MSVM KVDAVF+PQDNTIASAF + 
Sbjct: 181 DNSLSQVKEFRRIARKI<GYQVISYAVPSTNEVPATMSVMLGKVDAVFIPQDNTIASAFSS 240 

35 Query: 240 VIVAflNQANIPVYSSVDTMVEQGSIASVAQSQYGI.GLETAXQAIKVLRGKPVKDVPVKVI 299 

V+ + A IPVY+SVD MVE+G +A+++Q+QY LG++TA Q +K+++GK V DVPVKV+ 
Sbjct: 241 VMTTSKAAKIPVYTSVDRMVEKGGLAAISQNQYDLGVQTANQVLKLIKGKRVVDVPVKVV 300 

Query: 300 DTGKPSLNLKAAKHLGIKIPKK 321 
40 D G+P +N A LGI I K+ 

Sbjct: 3 01 DIGQPLINKNVAAELGIAIKKE 322 

SEQ ID 2710 (GBS254) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 4; MW 27kDa). It was also expressed in E.coli as a GST-fusion 
45 product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 3; MW 59.6kDa). 

GBS254-GST was purified as shown in Figure 203, lane 6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 893 

50 A DNA sequence (GBSx0947) was identified in S.agalactiae <SEQ ID 2713> which encodes the amino 
acid sequence <SEQ ID 27 14>. This protein is predicted to be probable permease of ABC transporter 
(rbsC). Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have an uncleavable N-term signal seq 
55 INTEGRAL Likelihood =-15.12 Transmeiribrane 127 

INTEGRAL Likelihood = -8.81 Transmembrane 206 
INTEGRAL Likelihood = -6.48 Transmembrane 260 
INTEGRAL Likelihood = -5.84 Transmembrane 234 



- 143 ( 119 - 151) 

- 222 ( 200 - 227) 

- 276 ( 258 - 282) 

- 250 ( 231 - 257) 
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INTEGRAL Likelihood = -4.78 Transmembrane 55 - 71 ( 54 - 72) 

INTEGRAL Likelihood = -3.61 Transmembrane 177 - 193 ( 176 - 194) 

INTEGRAL Likelihood = -3.35 Transmembrane 84 - 100 { 83 - 102) 

INTEGRAL Likelihood = -1.91 Transmembrane 10 - 26 ( 10 - 26) 



Final Results 

bacterial membrane Certainty=0 . 7050 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG07224 GB:AE004801 probable permease of ABC transporter 
[Pseudomonas aeruginosa] 
Identities = 116/288 (40%), Positives = 185/288 (63%), Gaps = 9/2B8 (3%) 

Query: 2 IISSVSQGLLWGILGLGIYLTPRILKFPDMTTEGSFPLGGAVCVTLMNQGVNPILATILG 61 

+ ++ GL++ ++ LG++++FR+L+FPD+T +GSFPLGGAVC TL+ G +P AT+ 
Sbjct: 6 LFGALEIGLIFSLV/ALGVFISFRLLRFPDLTVDGSFPLGGAVCATLIALGWDPYSATLAA 65 

Query: 62 MLSGMLAGFVTGLLYTKGKIPTILAGILVMTSCHSIMLMVMKRANLGLNEIQTLKDFLPF 121 

+G LAG TGLL K Kt +LA IL+M + +SI h +M + N+ L TL L 
Sbjct: 66 TAAGAIAGLATGLLNVKLKIMDLLASILMMIALYSINLRIMGKPNVPLIAEPTLFTLLQP 125 



Query: 122 SNDLNLLVLGLIAILLVISA-— LIYFLYTRLGQAYIATGDNPDMAKSFGIDTDKMEMLG 178 

+ + L+ + +VI+A L +F T+ G A ATG NP MA++ G++T M +LG 
Sbjct: 126 EWLSDYVFRPLLLVFIVIAAKLLLDWFFTTQKGLAIRATGSNPRMARAQGVNTGGMILLG 185 



Query: 179 LIVSNGLIALSGALVSQQDGYADVSKGIGVIVIGLASIIIGE-VLYSTGLTLFERLIAIV 237 
+ +SN L+AL+GAL +Q G AD+S GIG IVIGLA++I+GE +L S L L +A++ 
30 Sbjct: 186 MAI SNALVALAGALFAQTQGGADI SMGIGTIVIGLAAVTVGESILPSRRLI L - - ATLAVI 243 



Query: 238 VGSILYQFL1TAVI---ALGFNTNYLKLFSAIVLGICLMVPVLKTKIL 282 

+G+I+Y+F I + +G L L +A+++ + L++P+4-K ++L 

Sbjct: 244 LGAIWRFFIALAU^SDFIGLQAQDLNLVTAVLVTVALVIPMMKKRLL 291 



A related DNA sequence was identified in S.pyogenes <SEQ ID 271 5> which encodes the amino acid 
sequence <SEQ ID 271 6>. Analysis of this protein sequence reveals the following: 



Possible site: 55 
Seems to have an uncleavable N-term signal seq 
Likelihood =-10. 
Likelihood = -: 
Likelihood = - 
Likelihood = -' 
Likelihood = - 
INTEGRAL Likelihood = -. 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 147 ( 125 - 156) 

- 226 ( 204 - 230) 

- 281 ( 261 - 283) 
• 254 ( 233 - 261) 

- 105 ( 87 - 107) 

- 79 ( 62 - 79) 

- 196 ( 180 - 198) 

- 30 ( 14 - 30) 



Final Results 

50 bacterial membrane Certainty=0. 5182 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

55 >GP:AAG07224 GB:AE004801 probable permease of ABC transporter 

[Pseudomonas aeruginosa] 
Identities = 118/285 (41%), Positives = 186/285 (64%), Gaps = 7/285 (2%) 

Query: 6 IISSVSQGLIWGVLGLGIYLTFRILNFPDMTTEGSFPLGGAVAVTAISLGWNPFLSTLLG 65 
60 + ++ GLI+ ++ LG++++FR+L FPD+T +GSFPLGGAV T I+LGW+P+ +TL 

Sbjct: 6 LFGALEIGLIFSLVALGVFISFRLLRFPDLTVDGSFPLGGAVCATLIALGWDPYSATLAA 65 



Query: 66 MLSGALAGFLTGLLYTKGKMPTLLAGILVMTSCNSIMLMVMGRANLGLHDHKRIQDCLPF 125 
+GALAG TGLL K K+ LLA IL+M + SI L +MG+ N+ L + L 
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Sbjct: 66 TAAC3AIAGI^TGLLNVKLKIMDLIASIUWIALYSINLRIMGKPNVPLIAEPTtFTLLQP 125 

Query: 126 SIDLNSLLTGLITWIVIS VLIY^YTNLGQAYIATGDNKDMAKSFGINTDWMEVMG 182 

+ + L+ V IVI+ +L +F T G A ATG N MA++ G+NT M ++G 
Sbjct: 126 EWLSDYVFRPLLLVFIVIAaiQjLLDVfFFTTQKGLA-RATGSNPRMAI^^VNTGGMILLG 185 

Query: 183 LWSNSLIALSGALVSQQDGYADVSKGIGVIVIGIASIIVGEVLYSTGLTLLERL1AIVI 242 

+ +SN+L+AL+GAL +Q G AD+S GIG IVIGLA++IVGE + + +L L A+++ 
Sbjct: 186 MAISNALVALAGALFAQTQGGADISMGIGTIVIGIAAVIVGESILPSRRIjILATL-AVIL 244 

Query: 243 GSILYQFLISWIT LGFNTSYLKLISALVLALCLMIPWKER 284 

G+I+Y+F 1++ + +G h L++A+++ + L+IP++K+R 

Sbjct: 245 GAI VYRF F I ALALNSD F I GLQAQDLNLVTAVLVTVALV I PMMKKR 289 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/287 (79%) , Positives = 259/287 (90%) 

Query: 1 MIISSVSQGLLWGILGLGIYLTFRILKFPDMTTEGSFPLGGRVCVTLMNQGVNPIIATIL 60 

MIISSVSCGL+WG+LGLGIYLTFRIL FPDMTTEGSFPLGGAV VT ++ G NP L+T+L 
Sbjct: 5 MIISSVSQGLIWGVLGLGIYLTFRIIiNFPDMTTEGSFPLGGAVAVTAISLGWNPFLSTLL 64 

Query: 61 GMLSGMIAGFVTGLLYTKGKIPTIIAGILWSCHSIML^KRMILGLNEIQTLKDFLP 120 
GMLSG IiAGF+TGLLYTKGK+PT+LAGILVMTSC+SIMLMVM RANLGL++ + ++D LP 

Query: 121 FSNDLNLLVLGLIAILLVISALIYFLYTRLGQAYIATGDNPDMAKSFGIDTDKMEMLGLI 130 

FS ELK L+ GLI +++VIS LIYFLYT LGOAYIATGDN DMAKSFGI+TD ME++GL+ 
Sbjct: 125 FSIDLNSLLTGLITWIVISVLIYFLYmT^GQAYIATGDNKDMAKSFGINTDWMEVMGLV 184 

Query: 181 VSNGLIALSGALVSQQDGYADVSKGIGVIVIGIASIIIGEVIiYSTGLTLFERLIAIVVGS 240 

VSN LIALSGALVSQQDGYADVSKGIGVIVIGIASII+GEVLYSTGLTL ERLIAIV+GS 
Sbjct: 185 VSNSLIALSGALVSQQDGYADVSKGIGVIVIGLASIIVGEVLYSTGLTLLERLIAIVIGS 244 

Query: 241 ILYQFLITAVIALGFOTOTLKLFSMVlGICMlVPVLKrKILKGVRL 287 

ILYQFLI+ VI LGFNT4YLKL SA+VL +CLM+PV+K + KGVRL 
Sbjct: 245 ILYQFLISWITLGEOTSYLKLISALVLALCLMIPVVKSRFFKGVRL 291 

A related GBS gene <SEQ ID 8681> and protein <SEQ ID 8682> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 4.24 

GvH: Signal Score (-7.5): -6.43 
Possible site: 24 

»> Seems to have an uncleavable N-term signal sea; 

ALOM program count: 8 value: -15.12 threshold: 0.0 

INTEGRAL Likelihood =-15.12 Transmembrane 127 - 143 ( 119 - 151) 

INTEGRAL Likelihood = -7.54 Transmembrane 206 - 222 ( 201 - 225) 

INTEGRAL Likelihood = -6.48 Transmembrane 260 - 276 ( 258 - 282) 

INTEGRAL Likelihood = -5.84 Transmembrane 234 - 250 ( 231 - 257) 

INTEGRAL Likelihood = -4.78 Transmembrane 55 - 71 ( 54 - 72) 

INTEGRAL Likelihood = -3.61 Transmembrane 177 - 193 ( 176 - 194) 

INTEGRAL Likelihood = -3.35 Transmembrane 84 - 100 ( 83 - 102) 

INTEGRAL Likelihood = -1.91 Transmembrane 10 - 26 ( 10 - 26) 
PERIPHERAL Likelihood =4.77 36 
modified ALOM score: 3.52 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 7050 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 
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ORF00338(298 - 1146 of 1461) 

GP|9950013|gb|AAG07224.l|AE004801_2|AE004801(4 - 291 of 296) probable permease of ABC 
transporter { Pseudomonas aeruginosa} 
%Match =20.2 
5 %Identity =40.8 %Similarity =68.3 

Matches = 116 Mismatches = 84 Conservative Sub.s = 78 



YGIGLETAKQAIKVLRGKPVKDVPVKVIDTC^ 

I = =: II- 
MSLFSLFGALEIGLIF 



366 396 426 456 486 516 546 576 

15 GILGLGIYLTFRILKFPDMTTEGSFPLGGAVCVTLMNQGVNPIIATILGMLSGMLAGFVTGLLYTKGK1PTILAGILVMT 
:= ||::::||:|:|||:| |h I :| Ih =1 |||: Mil I II =11 IN 

SLVALGVFISFRLLRFPDLTVDGSFPLGGAVCATLIALGWDPYSATLAATAAGAIAGIATGLIWKLK1MDLIASILMMI 
30 40 50 60 70 80 90 

20 606 636 690 720 747 777 807 

SCHSIMLMVMKRANLG1NEIQTLKDFL- P- FSNDLNLLVLGLIAILLVI SALI - YFLYTRLGQAYIATGDNPDMAKSFGI 
: :|| I :| « I: I II : I I * : I = I I = I- I = = \ ■ h I I III II II- h 

ALYSINLRIMGKPNVPLIAEPTLFTLLQPEWLSDYVFRPIJ^OT^ 

110 120 130 140 150 160 170 

25 

837 867 897 927 957 987 1017 1047 

DTDKMEMLGLIVSNGLIALSGALVSQQDGYADVSKGIGVIXIGLASIIIGEVLYSTGLTLFERLIAIWGSILYQFLITA 
:| I :||: =11 |>|h||| =1 I 11 = 1 III I Illh>|:|| « = :« I | . : : | : | : | . | . | 
NTGGMILLGmiSNALVAIAGALFAQTQGGADISMGIGTIVIGLAAVIVGESILPSRRLILATL-AVILGAIVYRFFI-- 
30 190 200 210 220 230 240 250 

1077 1036 1116 1146 1176 1206 1236 1266 

VIALGFNTNY LKLFSAIVLGICLMVPVLKTKILKGVRL*W**KS*S*KKQPYKSvMV*QK*KRY*IMLI*VFM 

II = : I I :|::= : h = l = = l = = l 

35 - -ALALNSDFIGLQAQDIiNLOTAVLVTVALVIPMMKXRLLGKKGA 

270 280 290 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 894 

A DNA sequence (GBSx0948) was identified in S.agalactiae <SEQ ID 2717> which encodes the amino 
acid sequence <SEQ ID 2718>. This protein is predicted to be ABC transporter (potA). Analysis of this 
protein sequence reveals the following: 

Possible site: 36 
45 >» Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9887> which encodes amino acid sequence <SEQ ID 9888> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 



Query; 19 MVMKIIELKEATOQVSNGJ^MKTILDHVNLSIYEHDFITILGGNGAGKSTIiFNVIAGTL 78 
M ++ + + G +L ++L++ DFITI+GGNGAGKSTIi N IAGT+ 
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Sbjct: 1 MTTPVLTISDLHQTFEKGTINENHVLRGIDLTI>CJSGDFITIIGGNGAGKSTLLNSIAGTI 60 



Query: 


79 


MLSSGNIYIMGQDVTNLSAEKHAKYLSRVFQDPKMOTAPRMTVAENrjLVAKFEGEiCRPDV 13 8 






G I + +++T S 4-R+K +SRVFQDP+MGTA R+TV ENL +A RG+ R 


Sb'ct- 


61 


PTEQGKIVIXSDKElTRHSVTRRSKEISRVFQDPRMGTAVRIiTVEENLALAYiaiGQVRGFS 120 


Query: 


139 


PRKIINYTEEFQKLIARTGNGLDRHLETPTGLLSGGQRQALSLLMATLKKPNLLLLDEHT 198 






+ F++ +AR GL+ L T GLLSGGQRQA++LLMATL++P L+LLDEHT 


Sbjct: 


121 


SGVKGKHRAFFKEKLARLNLGLENRLTTEIG:jIiSGGQRQAITIjLMATLQQPKLILIjDEHT 180 




199 


AALDPRTSVSLMGLTDEFIKQDSLTALMITHHMEDALKyGNRVLVMKDGKIWDLNQAQK 258 






AALDP+TS+++M LTD+ I++ LTA M+TH MEDA++YGNR++++ GKIV D+ +K 


Sb j ct : 




AABDPKTSMTVMALTDQLIQEQQLTAFMVTHDMEDAIRYGNRLIMLHQGKIVVDITGEEK 240 


Query: 


259 


NKMAIADYYQLF 270 






+ + D LF 


Sb j ct : 


241 


QSLTVPDLMALF 252 



A related DNA sequence was identified in S. pyogenes <SEQ ID 271 9> which encodes the amino acid 
20 sequence <SEQ ID 2720>. Analysis of this protein sequence reveals the following: 
Possible site: 58 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm --- Certainty=0 . 2249 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities .= 186/250 (74%) , Positives = 210/250 (83%) 





22 


KIIELKEATVQVSNGLAEMKTIIJJHVNLSIY^ 


81 






KIIEL ATV V UG + KTILD+V L+ 1 YEHDF+TI LGGNGAGKSTLFNVI AGTL h+ 




Sbjct: 


3 


KIIELINATVDVDNGFEDAKTILDNVTLTIYEHDFLTILGGNGAGKSTLFNVIAGTLSLT 


62 




82 


SGNIYIMGQDVTtttSAEKRAKYLSRVFQDPKMGTAP^lTVAENLLVAItFRGEKRPLVPRK 


141 






G I I+GQDVT+ AEKRA YLSRVFQD KMGTAPRMTVAENLL+A+ RG KR L RK 




Sbjct: 


63 


RGQIRILGQDOTHWPAEKRALYLSRVFQDSKMGTAPRMTVAENLLIARQRGGKRSIASRK 


122 




142 


IINYTEEFQKLIARTGNGLDRHLETPTGLLSGGQRQALSLLMATLKKPNLLLLDEHTAAL 


201 






I + F+ L+ RTGNGL++HLETP GLLSGGQRQALSLLMATLKKP LLLLDEHTAAL 




Sbjct: 


123 


1TEHLASFEDLVKRTGNGLEKHLETPAGLLSGGQRQALSLLMATLKKPALLLLDEHTAAL 


182 


Query: 


202 


DPRTSVSLMGLTDEFIKQDSLTALMITHHMEDALKYGNRVLVMKDGKIVRDLNQAQKNKM 


261 






DP+TS SLM LTDEF+ +D LTALMITHHMEDAL YGNR++VMKDG I++DLNQ +K ++ 




Sbjct: 


183 


DPKTSQSLMQLTDEFVTKDGLTALMITHHMEDALTYGNRLI VMKDGNI I KDLNQMEKEQL 


242 




262 


AIADYYQLFD 271 








I DYYQLFD 




Sbjct: 


243 


TITDYYQLFD 252 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 895 

55 A DNA sequence (GBSx0949) was identified in S.agalactiae <SEQ ID 2721> which encodes the amino 
acid sequence <SEQ ID 2722>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 



60 



Final Results 
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bacterial cytoplasm Certainty=0 . 1930 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06117 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 236/549 (42%), Positives = 362/549 (64%), Gaps = 2/549 (0%) 



Query: 


4 


Sbjct: 


9 


Query: 


64 


Sbjct: 


69 


Query: 


124 


Sbjct: 


128 


Query: 


184 


Sbjct: 


188 


Query: 


244 


Sbjct: 


248 


Query: 


304 


Sbjct: 


307 


Query: 


364 


Sbjct: 


367 




424 


Sbjct: 


427 


Query: 


484 


Sbjct: 


487 




544 


Sbjct: 


547 



I LTHGH D IG LPY++ +4 PV+G+ LT+ I 



+SFF+T HSIP+S+GI I T +G IV+TGDFKFDQ 



GVL LLSDS NA 



SE EVG I E +GR+IV ASN+ R+QQV AA 



• 1 +++++Y+D + 1+ TG GEP+ 



EL+L++NL++PK+ PI GS+R AH LA 4- VG+ 



+G VP+G+V+IDG 4GDVGNIVLRDR++LS+DGI +W+T++K+ I+S 



h E+ ELV T++ 



+W LK VR+ +S+FLF++TKRRP 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2723> which encodes the amino acid 
sequence <SEQ ID 2724>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
»> Seems to have no N-terminal signal sequence 



• Final Results 

bacterial cytoplasm -— Certainty=0. 2204 (Affirmative) - 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:BAB06117 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 232/549 (42%) , Positives = 360/549 (65%) , Gaps = 2/549 (0%) 
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Query: 4 IKMIALGGWEYGKMFYI.VEINDSMFII£)AGLKfPENEQLGVDLVIPNLDYVIENKGKVQ 63 

I++ ALGGV E GKN Y+VE++D +F++DAGL +P++E LGVD+VIP++ Y++EKT+ +V+ 
Sbjct: 9 IRVFALGGVGEIGKMMYVVEVDDDLFVIDAGLMFPDDEMI.GVDWIPDISYLVENEERVR 68 

Query: 64 GIFLSHGHADAIGALPYLLAEVSAPVFGSELTIE1AKLFVKSHNSTKKFNHFHVYDSDTE 123 

X L+HGH D IG LPY+L +++ PV+G++LT+ L + +K + ++DS++ 

Sbjct: 69 AILLTHGHEDHIGGLPYVLQKLN\?PVYGTKLTLGLVEEKLI<EAGLIRSAK-LKLIDSNSR 127 

Query: 124 IEFKDGLVSFFRTTHSIPESMGIVIGTDKGNIIYTGDFKFDQAAREGYQTDLLRLAEIGK 183 

++ VSFFRT HSIP+S+GI I T +G I++TGDFKFDQ +G Q ++ ++A IG 

Sbjct: 128 LKLGSTPVSFFRTNHS1PDSVGICIQTSQGFIVHTGDFKFDQTPVDGKQAEIGKMAAIGH 187 



Query: 244 AHGRRWLTGTDAEMIVRTALRLEKLMITDERLLIKPKDMSKFEDHELIIIiEAGRMGEPI 303 

A R++ + G +V A RL L D+ L 1 +++SK++D + 1+ G GEP+ 

Sbjct: 248 ATNRKLAVAGRSMVKWSIAERLGYLEAPDD-LFIDIEEVSKYDDERVAI1TTGSQGEPM 306 

Query: 304 NSLQKMAAGRHRWQrKEGDLVYIVTTPSTAKFJyWARVENLIYKAGGSVia,ITQNLRVS 363 

++L +MA GHR+IEDVI TP E V+ + +L+++ G V + S 

Sbjct: 307 SALSRMAKGAHRQITITENDTVI1AATPIPGNERSVSTIVDLLHRIGADVIFGHGKVHAS 366 

Query: 364 GHANGRDLQLLMI^LKPQYLFPVQGEYRDIAAHAKIJ^EVGIFPENIHILKRGDIMVIiND 423 

GH + +L+L++NL++P++ P+ GE+R AH +LA+ VGI E I ++ +G+++ + 
Sbjct: 367 GHGSAEELKLMMLMRPKFFVPIHGEFRMQHAHKEIAKSVGIREEAIFLVDKGEWEFRN 426 

Query: 424 EGFLHEGGVPASDVMIDGNAIGDVGNIVLRDRKVLSEDGIFIVAITVSKKEKRIISKAKV 483 

G VP+ +V+IDG +GDVGNIVLRDR++LS+DGI +V +T++K+ I+S + 
Sbjct: 427 GQGRKAGKVPSGNVLIDGLGVGDVGNIVLRDRRLLSKDGIL'VVVVTIjNKQSGTILSGPNI 486 

Query: 484 OTRGFVYVKKSHDILRESAELVOTTOGNYIiKKDTFDWGELKGNVRDDLSKFLFEQTKRRP 543 

+RGFVYV++S ++ E+ ELV T+ + ++ +W LK NVR+ LS+FLFE+TKRRP 
Sbjct: 487 ISRGFVYWESEKLIEEANELVTETLKKCVTEW/^WSSLKSNTOEVLSRFliFEKTKRRP 546 

Query: 544 AILPWMEV 552 

ILP++MEV 
Sbjct: 547 MILPIIMEV 555 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 446/553 (80%) , Positives = 513/553 (92%) 

Query: 1 MSDIKIMALGGVRENGKNLYVTOVMJSXFVLDAGIjKYPENEQLGVDWIPNLDYLIENKK 60 

M+DIK+4ALGGVRE GKN Y4-VE+NDS+F+LDAGLKYPEHEQLGVD+VIPNLDY+IENK 
Sbjct: 1 MTDIKMIALGGVREYGKNFYLVEINDSMFIIjDAGLKYPENEQLGVDLVieMLDYVIEWKG 60 

Query: 61 RVQGIFliTHGHADAIGALPY'IIAEVTQ\PVFGSPLTIELAKLFVKNSTAVKKFNNFHVIDS 120 

+VQGIFL+HGHADAIGALPY++AEV APVFGS LTIELAKLFVK++ + KKFNNFHV+DS 
Sbjct: 61 K^QGIFLSHGHADAIGALPYLLAEVSAPVFGSELTISLAKLFWSNNSTKKFNNFHVVDS 120 

Query: 121 ETEIEFQDAVISFFKTTHSIPESMGIVIGTKEGNI VYTGDFKFDQAARKYYQTDLARLAE 180 

+TEIEF+D ++SFF+TTHSIPESMGIVIGT +GNI+YTGDFKFDQAAR+- YQTDL RLAE 
Sbjct: 121 DTEIEFKDGLVSFFRTTHSIPESMGIVIGTDKGNIIYTGDFKFDQAAREGYQTDLIaRLAE 180 

Query: 181 IGRDGVLALLSDSATS1ATSNEQVASEYEVGDE1KSVIEDAEGRVIVAAVASNLIRIQQVFD 240 

IG++GVLALLSDS NATSN+Q+ASE EVG+E-f SVI DA+GRVIVAAVASNL+RIQQVFD 
Sbjct: 181 IGKEGVIALLSDSVNATSNDQIASESEVGEEMDSVISDADGRVIVAAVASMLVRIQQVFD 240 

Query: 241 AAAENGF^WLTGFDIENITOTAIRMKRIHIADENMIIKPKDMTRYEDNELLILETGRMG 300 

+A +GRRWLTG D ENIVRTA+R++++ I DE ++IKPKDM+++ED+EL+ILE GRMG 
Sbjct: 241 SATAHGRRVVLTGTDAENITOTALRLEKLMITDFJRLLIKPKDMSKFEDHELIILEAGRMG 300 

Query: 301 EPINGLQKmiGRRllWQIKDGDLWIAmPSIAKEAWARWNLIYKAGGSVia.ITQ^ 360 

EPIN LQKMA GRHRYVQIK+GDLV+IVTTPS AKEA+YARVENLIYKAGGSVKLITQNL 
Sbjct: 301 EPINSLQKMAAGRHRYVQIKEGDIIWIVTTPSTAKEA^IVARvENLIYKAGGSVKIIITQNL 360 
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Query: 




Sb 3 ct: 




Query: 


421 


Sbjct: 


421 




481 


Sbjct: 


481 


Query: 


541 


Sbjct: 


541 



L +GF H G VPA DVMIDGNAIGDVGNIVLRDRKVLSEDGIFIV ITVSKKEK+IISK 
LNDEGFLHEGGVPASDVMIDGNAIGDVGNIVLRDRKVLSEDG1 FIVAITVSKKEKRI1 SK 480 

ARVOTRGFVYVTCKSRDILRESAELVOTTVEDYLSKDTFDWGELKGKVRDEVSKFLFDQTK 540 
A+VNTRGFVYVKKS DILRESAELVNTTV +YL KDTFDWGELKG VRD++SKFLF+QTK 
AKmTRGFVWKKSHDILRESAELVOTWGNYLKKOTFDWGELKGNVRDDLSKFLFEQTK 540 

RRPAILPWMEVR 553 
RRPAILPWMEVR 
RRPAILPWMEVR 553 

There is also homology to SEQ ID 4910. 

SEQ ID 2722 (GBS295) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 48 (lane 2; MW 89.4kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 167 (lane 9 & 11; MW 79kDa - 
thioredoxin fusion) and in Figure 238 (lane 3; MW 79kDa - thioredoxin fusion). 

Purified Thio-GBS295-His is shown in Figure 244, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 896 

A DNA sequence (GBSx0950) was identified in S.agalactiae <SEQ ID 2725> which encodes the amino 
acid sequence <SEQ ID 2726>. This protein is predicted to be tributyrin esterase. Analysis of this protein 
sequence reveals the following: 

Possible site: 22 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9885> which encodes amino acid sequence <SEQ ID 9886> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF62859 GB:AF157484 tributyrin esterase [Lactococcus lactis 
subsp. lactis] 

Identities = 154/262 (58%) , Positives = 188/262 (70%) , Gaps = 4/262 (1%) 

Query: 21 MAFFNIEYHSKVIjGTERCVNVIYPDAFEMSDDKIDDCDIPVLYLLHGMGGNENSWQKRTN 80 

MA NIEY+S+VLG R+VNVIYP++ ++ D DIPVLYLLHGM GNENSW R+ 

Sbjct: 1 MAVINIEYYSEVLGMNRKV1WIYPESSKVED--FTQTDIPVLYLLHGMSGNENSWIIRSG 58 

IERL+RHTNL +VMPSTDL +Y NT YG++YFDAIA ELPKV+ FFPN+S KREKNFIA 
Sbjct: 59 IERLIRHTNLAIVMPSTDLGFYVIWTYGMNVFI^IAHELPKVINNFFPNLSTKREKNFIA 118 



Query: 141 GLSMGGYGAYKIALLTNRFSHAMLSGALSFDFDLLFffi-IGN^ 200 
GLSMGGYGAY++AL T+ FS+A&SLSG L+FD + N N YW GIFG+ 
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Sbjct: 119 GLSMGGYGAYRIALGTDYFSYAASLSGVLTFDG- 

Query: 201 ERHSLRRYVESFDMKTKFYAWCGYEDFLFEANEVAIDELRQLGIjTIDYFMDHGKHEWYYW 260 

+ L + K K YAWCG +DFLF NE A EL++LG I Y 4 G HEWYYW 

Sbjct: 177 DNEILSiyffiRKQENKPKLYAWCGKQDFLFPGNEYATAEIjKKLGFDITYESSDGVHEWYYW 236 

Query: 261 NQQLEKVLEWLPVDYVKEERLS 282 

Q++E VL+WLP++Y +EERLS 
Sbjct: 237 TQKIESVLKWLPINYKQEERLS 253 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2727> which encodes the amino acid 
sequence <SEQ ID 2728>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 183 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/262 (65%) , Positives = 199/262 (75%) , Gaps = 1/262 (0%) 



Query: 81 IERLLRHTNLIVVMPSTDLAVrV'TJSTTKYGLDYFDAIArELPKVLKRFFPNMSDKREKNFIA 140 

IERLLRHTNLIWMPSTDL WYT+T YGL+Y+ A++ ELP+VL FFPNM+ KREK F+A 
Sbjct: 61 IERLLRHTNLIVVMPSTDLGWYTDTAYGLNYYFAIiSQELPQvliAAFFPlWITQKREKTFVA 120 

Query: 141 GLSMGGYGAYKIALLTNRFSHAASLSGALSFDFDLLFNNGNNNINYWSGIFGDLNNTDNI 200 

GLSMGGYGA+K AL +NRFS+AAS SGAL F + L + YW G+FG ++ D + 

Sbjct: 121 GLSMGGYGAFKWALKSNRFSYAASFSGALDFSPETLLEGKLGELAYWQGVFGQFDDPD-L 179 



Query: 261 NQQLEKVLEWLPVDYVKEERLS 282 

NQQLE +LEWLP++Y KEERLS 
Sbjct: 240 NQQLEVLLEWLPINYQKEERLS 261 

SEQ ID 2726 (GBS645) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 129 (lanes 8 & 10; MW 60kDa + lane 9; MW 27kDa) and in Figure 186 (lane 4 ; 
MW 60kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 129 (lane 12; MW 34.7kDa), in Figure 140 (lane 8; MW 35kDa) and in Figure 
178 (lane 4; MW 35kDa). Purified GBS645-GST is shown in Figure 236, lane 11; purified GBS645-His is 
shown in Figure 229, lanes 3-4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 897 

A DNA sequence (GBSx0951) was identified in S.agalactiae <SEQ ID 2729> which 
acid sequence <SEQ ID 2730>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 
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INTEGRAL Likelihood = -9.34 Transmembrane 22 - 38 ( 18 - 46) 

Final Results 

bacterial membrane Certainty=0. 4736 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 273 1> which encodes the amino acid 
sequence <SEQ ID 2732>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.43 Transmembrane 25 - 41 ( 20 - 46) 
INTEGRAL Likelihood = -2.71 Transmembrane 4 - 20 ( 3 - 20) 

Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 31/87 (35%) , Positives = 50/87 (56%) , Gaps = 2/87 (2%) 

Query: 1 MRTLFRMIFAIPKFIFRLIWNIIWGIFKTVLVIAIILFGLYYYANHSQSEFANQLSDIIQ 60 

M+ L +1 +PK I ++ W++I G +T+L++ 11+ GL YY+NHS S AN++S I 
Sbjct: 1 MKQLLAIILWLPKLIVKMFWHLIKGFLQTILLVTIIIIGLMYYSNHSDSVLANKIS--IV 58 

Query: 61 TGKTFLNF7ADTNQLKNS FTNLATDNVH 87 

T+ F Q++T +NH 
Sbjct: 59 TEQWQIFDILTQKPSAKTRHGSGNSH 85 

SEQ ID 2730 (GBS220d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 155 (lane 11-13; MW 50kDa) and in Figure 239 (lane 12; MW 50kDa). It 
was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in 
Figure 155 (lane 14-16; MW 25.2kDa) and in Figure 184 (lane 7; MW 25kDa). Purified GBS220d-GST is 
shown in Figure 246, lanes 3 & 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 898 

A DNA sequence (GBSx0953) was identified in S.agalactiae <SEQ ID 2733> which encodes the amino 
acid sequence <SEQ ID 2734>. This protein is predicted to be unnamed protein product (rpiA). Analysis of 
this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2538 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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Query: 2 DELKKLAGOTAAKWKNGMIVGLGTGSTAYFFVEEIGERVKEEC3L-QWGVTTSNRTTEQ 60 

D+LKKLA A VK+GM+ +GLGTGSTA F V IG + L +VG+ TS RT EQ 

Sbjct: 59 DDLKKLAAEKAVDSVKSGMVLGLGTGSTARFAVSRIGELLSAGKLTNIVGIPTSKRTAEQ 118 

Query: 61 ARGLGIPLKSADDIDVIDVTVDGADEVDPDFKGIKGGGGALLMEKIVATPTKEYIWVVDE 120 

A LGIPL DD ID+ +DGADEVDPD N +KG GGALL EK+V + ++I WD+ 

Sbjct: 119 AASLGIPLSVLDDHPRIDIAIDGADETOPDLNLVKGRGGALLREKMVEAASDKFIVVVDD 178 

Query: 121 SKLVETLGAFKL- -PVEW RYGSERLFRVFKSKGYCPSFRETEGDR--FITDMGNY 172 

+KLV+ LG +L PVEW +Y +RL +FK G C + EGD ++TD NY 

Sbjct: 179 TKIiVDGLGGSRIjAMPVEWQFCWKYNLKRLQEIFKELG-CEIAKLRMEGDSSPYOTDNSNY 237 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2735> which encodes the amino acid 
sequence <SEQ ID 2736>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



25 Final Results 

bacterial cytoplasm Certainty=0 . 164S (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 166/222 (74%), Positives = 190/222 (84%) 

Query: 1 MDELKKIJIGVTAAKYVKNGMIVGLGTGSTAYFFVEEIGRRVKEEGLQVVGVTTSNRTTEQ 60 
M+ LKK+AGVTAA+YV +GM +GLGTGSTAY+FVEE IGRRVK+EGLQWGVTTS+ T++Q 
35 Sbjct: 1 MEALKKIAGVTAAQYVTDGMTIGLGTGSTAYYFVEEIGRRVKQEGLQWGVTTSSVTSKQ 60 

Query: 61 ARGLGIPLKSADDIDVIDVTVDGADEVDPDFNGIKGGGGALLMEKIVATPTKEYIWVVDE 120 

A LGIPLKS DDID ID+TVDGADEVD +FNGIKGGG ALLMEKIVATPTKEYIWWD 
Sbjct: 61 AEVLGIPLKSIDDIDSIDLTVDGADEVDKNFNGIKGGGAALLMEKIVATPTKEYIWVVDA 120 

40 

Query: 121 SKLVETLGAFKL PVEWRYGSERIiFRVFKSKGYCPSFRETEGDRFITDMGNYIIDLDLKK 180 

SK+VE LGAFKLPVEW+YG++RLFRVF+ GY PSFR R +TDM NYIIDLDL 

Sbjct: 121 SKMVEHLGAFKLPVEWQYGADRLFRVFEKAGYKPSFRMKGDSRLVTDMQNYIIDLDLGC 180 

45 Query: 181 IEDPKQLANELDHTvGWEHGLFNGMVNKVIVAGKMGLDILE 222 

I+DP + LD TVGWEHGLFNGMV+KVIVA K+G+ +LE 
Sbjct: 181 IKDPVAFGHLLDGTVGWEHGLFNGMVDKVTVASKDGVTVLE 222 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 899 

A DNA sequence (GBSx0954) was identified in S.agalactiae <SEQ ID 2737> which encodes the amino 
acid sequence <SEQ ID 273 8>. This protein is predicted to be phosphopentomutase (deoB). Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0546 (Affirmative) < suco 
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- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45496 GB:U80410 phosphopentomutase [Lactococcus lactis subsp. cremoris] 
Identities = 275/408 (67%) , Positives = 325/408 (79%) , Gaps = 7/408 (1%) 

QFDRIHLWLDSVGIGAAPDANDPVNAGVP DGASDTLGHISKTVGLAVPNMAKI 5S 

+F RIHLW+DSVGIGAAPDA+ FN V D SDT+GHIS+ GL VPN+ K+ 

KFGRIHLWMDSVGIGAAPDADKFFNHDVETHEAINDVKSDTIGHISEIRGLDVPNLQKL 63 



G GNIPR LKT+PA + P+ Y TKL+E+S GKDTKTGHWEIMGLNI PF T+ 



ED++ KIE+FSGRK+IREANKPYSGTAVI+DFGPRQ+ETGELIIYTSADPVLQIAAHED+ 



I EELY+ICEY RSIT+E ++ GRIIARPYVGE GNF RT R DYA+SPF +TVL 



KL +AGIDTY+VGKI+DIFN G+ +DMGHN ++ G+D L+K M +EF +GFSFTNLV 



DFDA YGHRRD GY + +FD RLPEII AM++ DLL+ ITADHGNDP+Y GTDHTREY 



IPL+ +S SF + + PVGHFAD I SAT+A+NF V A GESFL LV 
IPLVIFSKSFKEPKVBPVGHFADISATIAENFSVKKAQTGESFLDALV 411 

A related DNA sequence was identified in S.pyogenes <SEQ ID 273 9> which encodes the amino acid 
sequence <SEQ ID 2740>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0185 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 348/402 (86%) , Positives = 374/402 (92%) 

Query: 1 MSQFDRIHLvVLDSVGIGAAPDAI'IDFi/NAGVPDGASDTLGHISKTVGLAVPNMAKIGLGN 60 

MS+F+RIHLWLDSVGIGAAPDA+ F NAGV D SDTLGHIS+ GL+VPNMAKIGLGN 
Sbjct: 1 MSKFmiHLVVLDSVGIGAaPDADKFFWAGVADTDSDTLGHISEAAGLSVPNMAKIGLGN 60 

Query: 61 IPRPQALKTVPAEENPSGYATKLQEVSLGKDTMTGHWEIMGLNITEPFDTFWNGFPEDII 120 

I RP LKTVP E+NP+GY TKL+EVSLGKDTMTGHWEIMGLNITEPFDTFWNGFPE+I+ 
Sbjct: 61 ISRPIPLKTVPTEDNPTGYVTKLEEVSIjSKDTMTGHWEIMGLNITEPFDTFWNGFPEEIL 120 

Query: 121 TKIEDFSGRKVIREANKPYSGTAVIDDFGPRQMETGELIIYTSADPVLQIAAHEDIIPLE 180 

TKIE+FSGRK+ IREANKPYSGTAVIDDFGPRQMETGELI + YTSADPVLQIAAHEDI IP+E 
Sbjct: 121 TKIEEFSGRKIIREANKPYSGTAVIDDFGPRQMETGELIVYTSADPVLQIAAHEDIIPVE 180 

Query: 181 ELYRICEYARSITMERPALLGRI IARPYVGEPGNF7RTANRHDYAVSPFEDTVLNKLDQA 240 

ELY+ICEYARSXT+ERPADLGRIIARPYVG+PGNFTRTANRHDYAVSPF+DTVENKL A 
Sbjct: 181 ELYKICEYARSITLERPALLGRIIARPYVGDPGNFTRTANRHDYAVSPFQDTVLNKLADA 240 





3 


Sbjct: 


4 




57 


Sbjct- 


64 




117 


Sbjct: 


124 


Query: 


177 


Sb j ct : 


184 




236 


Sbjct: 


244 




296 


Sbjct: 


304 




356 


Sbjct: 


364 
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Query: 241 GIDTYAVGKINDIFNGSGINHDMGHNKSNSHGIDTLIKTMGLSEFEKGFSFTNLVDFDAL 300 

G+ TYAVGKINDIFNGSGI +DMGKNKSNSHGIDTLIKT+ L EF KGFSFTNLVDFDA 
Sbjct: 241 GVPTYAVGKINDIFNGSGITNDMGENKSNSHGIDTLIKTLQLPEFTKGFSFTNLVDFDAN 300 

Query: 301 YGHRRDPHGYRDCLHEFDERLPEIISAMRDKDLLLITADHGNDPTYAGTDHTREYIPLLA 360 

+GHRRDP GYRDCLHEFD RLPEII+ M++ DLLLITADHGNDPTYAGTDHTREYIPLLA 
Sbjct: 301 FGHRRDPEGYRDCLHEFDNRLPEI1ANMKEDDLLLITADHGNDPTYAGTDHTREYIPLLA 360 

Query: 361 YSPSFTGNGLIPVGHFADISATVADNFGVDTAMIGESFLQDL 402 

YS SFTGNGLIP GHFADISATVA+NFGVDTAMIGESFL h 
Sbjct: 361 YSVSFTGNGLI PQGHFADI SATVAENFGVDTAMIGESFLSHL 402 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 900 

A DNA sequence (GBSx0955) was identified in S.agalactiae <SEQ ID 2741> which encodes the amino 
acid sequence <SEQ ID 2742>. This protein is predicted to be unnamed protein product (mtaP). Analysis 
of this protein sequence reveals the following: 



- 231 ( 215 - 231) 

■ Final Results 

bacterial membrane --- Certainty=0 . 1574 (Affirmative) < succ: 

bacterial outside --7 Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2743> which encodes the amino acid 
sequence <SEQ ID 2744>. Analysis of this protein sequence reveals the following: 

Possible site: 36 



Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 225/269 (83%) , Positives = 248/269 (91%) 

MTLLEKINETRDFLQAKGVTAPEFGLILGSGLGELAEEIENPIVVDYADIPNWGQSTVVG 6 0 
M+L+ KINET+DFL KG+ PEFGLILGSGLGELAEE+EN IV+DYADIPNWG+STWG 
MSLMTKINETKDFLVTKGIETPEFGLILGSGLGELAEEVENAIVIDYADIPNWGKSTVVG 6 0 

HAGKLVYGDLSGRKVIALCGRFHFYEGNTMEWrFPTOIMRALftCHSvLvTNAAGGIGYG 12 C 
HAGKLVYGDL+GRKVLALQGRFHFYEGN +EWTFPVR+M+AL C VLVTNARGGIGYG 



PGTLiM I DHINM G NPLIGENL+EFGPRFFDMSDAYT YR KAH++AEK NIKLE+G 



VY+G++GPTYETPAEIRAF+ +GA AVGMSTVPEVIVAAHSGLKVLGISAITNFAAGFQS 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 





ELNHEEWEVTQRIKEDFKGLVKSLVAEu 269 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 901 

A DNA sequence (GBSx0956) was identified in S.agalactiae <SEQ ID 2745> which encodes the a 
acid sequence <SEQ ID 2746>. Analysis of this protein sequence reveals the following: 



Possible site: 31 
>>> Seems to have a cleavable 
INTEGRAL Likelihood = -£ 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



or, signal seq. 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- Final Results 

bacterial 
bacterial outside 
bacterial cytoplasm 



266 - 282 ( 263 • 
231 - 247 ( 229 ■ 
356 - 372 ( 352 



■ 319 ( 



297 



43 



- 353 ( 334 

- 407 ( 387 

- 193 ( 177 

- 175 ( 159 

- 214 ( 196 



289) 
253) 
376) 
326) 
355) 
409) 
193) 
175) 
215) 



— Certainty=0. 4736 (Affirmative) < suco 

— Certainty=0. 0000 (Not Clear) < suco 

— Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9883> which encodes amino acid sequence <SEQ ID 9884> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD53928 GB:AF179611 chloride channel protein [Zymomonas 
mobilis] 

Identities = 121/410 (29%) , Positives = 213/410 (51%) , Gaps = 19/410 (4%) 

VKFMIAVLFMTVmGVGAILMHYVLMFTEWIAFGDSRENTLSLLN SVTPIKRVL 67 

+++ +A L + + G+G +L+ ++L + +A+G S ++ +S + + +P++R+ 

IRYGLACLAVGCLTGLGGMLLSWILHAVQHIAYGYSLQHVISEESFLKGSMAASPLRRLE 6 2 





14 


Sbjct: 


3 




68 


Sbjct: 


63 




128 


Sbjct: 


120 


Query: 


188 


Sbjct: 


180 




246 


Sbjct: 


236 




303 


Sbjct: 


296 


Query: 


362 


Sbjct: 


356 



++ L ++ +A +A -t 



V L+ L PI 



++G A FLA +M+ P+TA+ LVI F 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2747> which 
sequence <SEQ ID 2748>. Analysis of this protein sequence reveals the following: 



Possible site: 13 
>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood ■- 

INTEGRAL Likelihood = 

INTEGRAL Likelihood ■■ 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood > 

INTEGRAL Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



2B4 - 300 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



184 - 201 



- Certainty=0. 3166 (Affirmative) < 

- Certainty=0. 0000 (Not Clear) < £ 

- Certainty=0. 0000 (Not Clear) < £ 



20 The protein has homology with the following sequences in the databases: 

>GP:AAF41386 GB-.AE002449 chloride channel protein- related protein 
[Neisseria meningitidis MC58] 
Identities = 137/373 (36%), Positives = 201/373 (53%), Gaps = 23/373 (6%) 

IHLIQSLSFGFSQG SFSTMIASVPPQRRALSLLFAGLLAGLGWHLLAKKGKDIQSI 114 

+H IQ ++G+ SF +A RR L G +AG GW LL + GK I 

MHFIQHTAYGYGADGVYTSFREGVAQASGMRRVAVLTLCGAVAGSGWWLLKRFGKPQIEI 6 0 

QQIIQDDISFSPW-TQFWHGWLQLTTVSMGAPVGREGASREVAVTLTSLWSQRCNLSKAD 173 
+ ++ 4 P+ T +H LQ+ TV +G+P+GRE A RE+ +R L + + 



+LL+ACASGA L AVYN PLA+ LFILEA+L W+ -i 





59 


Sbjct: 


1" 




115 


Sbjct: 


61 


Query: 


174 


Sbjct: 


121 


Query: 


234 


Sbjct: 


179 


Query: 


292 


Sbjct: 


238 


Query: 


346 


Sbjct: 


292 




406 


Sbjct: 


351 



r T+ T L S + G IL 4 



F+LI +S++FPEILGNGKAG L F L 4 



- -LSYISWLLVAKAVAISLVFASGA 345 

L+ + WL+V A+A+ GA 
jGLTAVKWLWLMALAV GA 291 



V P +S+ A -t-VGA +FLGV K+PL A F++ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/415 (31%) , Positives = 215/415 (51%) , Gaps = 9/415 (2%) 

Query: 2 LNFKMVSRLYYAWFM1AVLFMT- vMAGVGAILMHYVLMFTEWIAFGDSRENTLSLLNSV 60 

LNF S + + LF+T + AG+ A ++ + + L+FG S+ + +++ SV 

Sbjct: 22 LNFCYNSLMKRHFLLLTFYLFLTGLTAGLVAFILTKAIHLIQSLSFGFSQGSFSTMIASV 81 

Query: 61 TPIKRVLSLTLVSFLASLSWYYLQIKPKQITSIKQQVVFKDFSVKKSPYWLHIGHAFLQL 120 

P +R LSL LA L W+ L K K I SI QQ-H+ D S SP W H +LQL 

Sbjct: 82 PPQRRALSLLFAGLIAGLGWHLLAKKBKDIQSI-CjQIIQDDISF--SP-WTQFWHGWLQL 137 

Query: 121 IVVGTGGPIGKEGAPREFGAINAGKISDLLALKVLDKRLLIISGAAAGLSAVYQVPLASV 180 



WO 02/34771 



PCT/GB01/04789 



Query: 181 FFAFETLALGISLKNIVTLIASTFGAA3 IAQLVI STAPL - YHISKMSLNSQSLAFMFLIV 239 

F E + SLKNI +++ A L+ + Y + + +L L 

Sbjct: 198 LFI LEAI LI^WSLKNI YARCLTS WAVETVALLQGRHE IQYLMPQQHWTLGTLIGS VLAG 257 

Query: 240 LCVTPIAISFRYLNQKVTERRIKNIKILLSLPWSLIVSVLSIVYPQILGNGNA-LVQEV 298 

L ++ A ++++L + + + K+ + + + +++ LSI +P+ILGNG A L+ + 
Sbjct: 258 L I LSLFAHAYKHLLKHLPK7ADAKSQWF I PKVLIAFSL IAGLSI FFPE ILGNGKAGLLFFL 317 

Query: 299 FKGTTVSLIAILWLKMIATLSTLYAGAYGGILTPSFSIGACLGFLLASISIPLLP-HTS 357 

+ +S 1+ L+V K +A +GA GG + PS +G G LLA +S L+P +S 

Sbjct: 318 HEEPHLSYISWLLVAKAVAISLVFASGAKGGKIAPSM4LGGASGLLLAILSQYLIPLSLS 377 

Query: 358 IVTSMLVGAAIFIAITMRAPLTAVGLVI3FTGQSVITIVPLTIA-VLFATAYDYF 411 

+++VGA IFL + + PL A ++ TGQS++ I+PL +A ++F +Y ++ 
Sbjct: 378 NTLAIMVGATIFLGVINKIPLAAPVFLVEITGQSLLMIIPLALANLIFYFSYQFY 432 



20 A related GBS gene <SEQ ID 8683> and protein <SEQ ID 8684> were also identified. Analysis of tl 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
SRCFLG: 0 

McG: Length of UR: 19 
25 Peak Value of UR: 2.96 

Net Charge of CR: 2 
McG: Discrim Score: 9.64 
GvH: Signal Score (-7.5): 1.15 
Possible site: 26 
30 >» Seems to have a cleavable N-term signal seq. 



Amino Acid Composition: calculated 


from 27 










ALOM program 




-9 


34 threshold: 


0.0 








INTEGRAL 


Likelihood = -9 


34 


Transmembrane 


261 


277 


258 


284 


INTEGRAL 


Likelihood = -8 


97 


Transmembrane 


226 


242 


224 


248 


INTEGRAL 


Likelihood = -7 


70 


Transmembrane 


351 


367 


347 


371 


INTEGRAL 


Likelihood = -7 


32 


Transmembrane 


298 


314 


292 


321 


INTEGRAL 


Likelihood = -5 




Transmembrane 


332 




329 


350 


INTEGRAL 


Likelihood = -5 


57 


Transmembrane 


386 


402 


382 


404 


INTEGRAL 


Likelihood = -2 




Transmembrane 


172 


188 


172 


188 


INTEGRAL 


Likelihood = -1 


01 


Transmembrane 


154 


170 


154 


170 


INTEGRAL 


Likelihood = -0 


43 


Transmembrane 


193 


209 


191 


210 


PERIPHERAL 


Likelihood = 1 


22 


61 










modified ALOM 


score: 2.37 
















CFP: 0.474 















* Reasoning Step: : 



- Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 4736 (Affirmative) ■ 
-- Certainty=0 . 0000 (Not Clear) < ! 
Certainty=0 . 0000 (Hot Clear) < i 



The protein has homology with the following sequences in the databases: 

ORF00327{340 - 1533 of 1869) 

GP|5834362|gb|AAD53928.l|AF1796U_12|AF179611(3 - 405 of 425) chloride channel protein 
{Zymomonas mobilis} 
%Match =14.7 

%Identity = 30.2 %Similarity = 56.1 

Matches = 121 Mismatches = 169 Conservative Sub.s = 104 



270 



300 



330 



360 



390 



420 



RSLKLLSvLKKISRD*LlNra*LIiNFKMVSRLYYAVK?MIAVLF L 

::: : | | : : |:| :|: : : | : :|:| | : : ,| | 
MKIRYGLACLAVGCLTGLGGMLLSWILHAVQHIAYGYSLQHVISEESFL 
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LNSV--TPIKRVLSLTLVSFLASLSlAr^LQIKPKQITSIKQQWFKDFSVKKSPYWLHIGHAFr,QLIYVGTGGPIGKEGA 
1= «|:.|. I •■ ■ 11= = II I I = 1 = 1 I I :|| = : II I hh! I 

KGSMAASPLRRLEVLVFCGAWGGGWGLLRHFGSPLVSITQAVAMK---RVMPFWTTIIHVLLQIVTVGLGSPLGREVA 



732 762 792 822 852 882 912 942 

PREFGAINAGKI SDLIiALKVLDKRLLI I SGAAAGLSAWQVPLAS VFFAFETLALG I SLKNI VTLLASTFGAAS IAQLVI 
10 |||:|:: : : | :|:|: || ||:::|| |||: :||:| | : : :: | =: :| :| ::: 

PRELGSLIGERFAFWGGLSENQRRILVACGAGAGFASVYWPLSGALFA^EALLMTWASPWIVALLTSALSARMAWILL 
140 150 160 170 180 190 200 



972 1002 1032 1059 1089 1119 114S 1176 

1 5 STAPLYHISKMSLNSQSLAFMFLIVLCVTPIAIS - FRYLNQKVTERRIKNIKILLSLPWSLI -VSVLSIVYPQILGNGN 

: :||: :::: | |: : II ||: :||:| I 11= = I : :: : = : I I : : I : I I I I I 
GNSMVYHVPAWPVDTR-LMUJUJIAGPIFGIA^FRMSQ^ 

220 230 240 250 260 270 280 

20 1206 1233 1263 1293 1323 1353 1383 1413 

AliVQEVFKGTTVSLIA-ILVVLKMIATLSTLYAGAYGGILTPSFSIGACLGFLLASISIPLLPHISIVTSMLVGAAIFLA 
II : I I :|::| : hlllllhlll I I I I - = II = I -II III 

GPVSIiAFNDmSGMKAGELFCFKIIAVFLALWAGAYGGLLTPGISFG^ 

300 310 320 330 340 350 360 

25 

1443 1473 1503 1533 1563 1593 1623 1653 

ITMRAPLTAVGLVISFTGQSVITIVPLTIAVJ J FATAYDYFIRKMRSLYVNPY*SKTR*NCR*NFTSRRSTPCEIYCREFF 

:|= hlh III I :«|. II : I I 

SSMKMPITAMALVIEFARTGHDFLIPIAFAVAGSIAISQFYDQKXQPKTASKSVISHLGG 
30 380 390 400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 902 

35 A DNA sequence (GBSx0957) was identified in S.agalactiae <SEQ ID 2749> which encodes the amino 
acid sequence <SEQ ID 275 0>. This protein is predicted to be purine nucleoside phosphorylase , fragment 
(deoD-1). Analysis of this protein sequence reveals the following: 
Possible site: 25 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm --- Certainty=0. 2384 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ? 

45 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC18350 GB:Y17900 putative purine-nucleotide phosphorylase 
[Streptococcus salivarius] 
Identities = 200/236 (84%) , Positives = 219/236 (92%) 

50 





1 


MSIHIEAKQGEIADKILLPGDPLRAICFIAENFLEDAVCFNTVRNMFGYTGTYKGHRVSVM 6 0 






MSIHI AKQGEIADKILLPGDPLRAKFIAENFLEDAVCFN VRNMFGYTGTYKG RVSVM 




Sbjct: 


1 


MSIHIAAKO^aEIADKILLPGDPLRAKFIAENFLEDAVCFWEVKNMFGYTGTYKGERVSviyi 


60 


Query: 


61 


GTGMGMPS I S I YARELI VDYGVKTtilRVGTAGAINPDIHVRELVLAQAAATNSNI IRNDW 


120 






GTGMGMPSISIYARELIVDYGVK LIRVGTAG++N D+HVRELVLAQAAATNSNI IRNDW 




Sb j Ct : 


61 


GTGMGMPSISIYARELIVDYGVKKLIRVGTAGSIiNEDVHTOELVLAQAAAraSNIIRNDW 


120 




121 


PEFDFPQIADFKLLDI^YHIAKEMDITTHVGSVLSSDVFYSKQPDRNMAIjGKLGVHAIEM 


180 






P++DFPQIA+F LLDKAYHIAK +TTHVG+VLSSDVFYSN ++N+ LGK GV A+EM 




Sbjct: 


121 


PQYDFPQIANFNLLDKAYHIAKNFGMTTHVGNVLSSDVFYSNYFEKNIELGKWGVKAVEM 


180 
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Query: 161 EAAALYYLAAQHNVhUVIAMMTISDNliTOPEEOTSAEERQTT 236 

EAAALYYLAAQH V+ALA+MTISD+L NP+EDT+AEERQ TFTDMMKVGLETLI++ 
Sbjct: 181 EAAALYYI^QHQVDAIAim'ISDSLVNPDEDTTAEEROirrFTDMMKVGLETLIAD 236 

A related DNA sequence was identified in S.pyogenes <SEQ ID 275 1> which encodes the amino acid 
sequence <SEQ ID 2752>. Analysis of this protein sequence reveals the following: 



• Pinal Results 

bacterial cytoplasm --- Certainty-0. 2117 (Affirmative) < succj 

bacterial membrane --- CertaintyO. 0000 (Not Clear) < suco 

bacterial outside --- Certainty-0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities « 210/235 (89%) , Positives « 226/235 (95%) 



Query: 1 MSIHI EAKQGEI ADKI LLPGDPLRAKFIAENFLEDAVCFOTT/RNMFGYTGTYKGHRVSVM 60 

MSIHI AK+G+ IADKI LLPGDPLRAKFIAENPLEDAVCFN VRNMPGVTGTYKGHRVSVM 
Sbjct: 1 MSIHI SAKKGDIADKILLPGDPIJiAKPIAENFLEDAVCFNEVRNMFtjYTGTyKGHRVSVM 60 

Query: 61 GT3M3M PS I S I YARE L I VD YGVKTL I R VGTAGAI NPD I HVRELVIAQAAATNSN 1 1 RNDW 120 

GTGNK>IPSISIVARELIVDYGvTCrLIRVGTAGAI+P++HVRELVLAQAAATNSNIIRND-f 
Sbjct: 61 GTGMGMPSISI YARELIVDYGV)CrLIRVGTAGAIDPEWVRELVIAQAAATNSNIIRNDF 120 

Query: 121 PEFDFPQI ADFKLLDKAYH I AKEMD I TTHVGSVLSSDVFYSNQPDRNMAIjGKIjGVHAI EM 180 

PEFDFPQIADF LLDKAYH I A+ EM +TTHVG+VLSSDVFY+N P+RNMALGKLGV AIEM 
Sbjct: 121 PEFDFPQIADFGI^KAYIUAREMGVTTHVGNVI^SDVFyTNM^ 180 

Query: 181 EAAALYYLAAQHNVNALAMMTI SDfrtiNNPEEDTSAEERQTTFTDMMlCVGLETIi I S 235 

EAAALYYIAAQH+V AL +MTISDNLN+P EDT+AEERQTTFTDMMKVGLETLI + 
Sbjct: 181 EAAALYYIAAQHHVTCAMIMTISDNI^PTEDTTAEE^ 235 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 903 

A DNA sequence (GBSx0958) was identified in S.agalactiae <SEQ ID 2753> which encodes the amino 
acid sequence <SEQ ID 2754>. Analysis of this protein sequence reveals the following: 

40 Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty-0 . 1710 (Affirmative) < suco 

45 bacterial membrane --- Certainty-0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty-0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9881 > which encodes amino acid sequence <SEQ ID 9882> 
was also identified. 

50 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2755> which encodes the amino acid 
sequence <SEQ ID 2756>. Analysis of this protein sequence reveals the following: 
Possible site: 21 

>» Seems to have no N-terminal signal sequence 

55 



WO 02/34771 



-1002- 



PCT/GB01/04789 



Final Results 

bacterial cytoplasm --- Certainty=0. 1386 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/253 (49%) , Positives = 175/253 (68%) , Gaps = 2/253 (0%) 

Query: 3 IEMTDFSTALKVLVDQYSYHNAFLLLQKHGPLNSDLLFLLEMMKERRELNIDPLFAHQEQ 62 
10 + MT+ T L +L+D Y+Y++AF + + + L+LLEM+KERRELN-f FL H + 

Sbjct: 1 LPMTNNQT-LDILLDVYAYNHAFRIAKALPNIPKTALYLLEMLKERRELNLAFLAEHAAE 59 

Query: 63 WILQEKYNIKL-LHNPYDLELLANYIMDjEAICVI<NGLIIDFVRSVSPILYRLFMILLAQ 121 
++-M-Y+ L L+ + E +ANYI+DLE KVKNG I IDFVRS VSPILYRLF+ L+ 
15 Sbjct: 60 NRTIEDQYHCSLWLNQSLEDEQIANYILDLEVKVKNGAIIDFVRSVSPILYRLFLRLITS 119 

Query: 122 EVPHLHDYIHNAFJ)DHYDTWKFKELKESNHPVLLAFSERWHDSRLTSKSLAECLQLTDLD 181 

E+P+ YI + ++D YDTW F+ + ES+H V A+ + +T+KSLA+ L LT L 

Sbjct: 120 EIPNFKAYIFDTKNDQYDTWHFQAMLESDHEVFKAYLSQKQSRNVTTKSLADMLTLTSLP 179 

20 

Query: 182 EEVKSTI IQLRQFEKSVRNPLAHLI KPFDEQELYRTTQFSSQAFLDQI I FLAKVIGVEYD 241 

+E+K + LR FEK+VRNPLAHLI KPFDE+EL+RTT FSSQAFL+ II LA GV Y 
Sbjct': 180 QEIICDLVFLLRHFEKAVRNPLAHLIKPFDEEELHRTTHFSSQAFLENIITIATFSGVIYR 239 

25 Query: 242 TVNFHYDTVNKLI 254 

F++D +N +1 
Sbjct: 240 REPFYFDDMNAII 252 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 904 

A DNA sequence (GBSx0959) was identified in S.agalactiae <SEQ ID 2757> which encodes the amino 
acid sequence <SEQ ID 2758>. This protein is predicted to be CpsY protein. Analysis of this protein 
sequence reveals the following: 

35 Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 260 - 276 ( 260 - 276) 

Final Results 

40 bacterial membrane --- Certainty=0. 1235 (Affirmative) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9879> which encodes amino acid sequence <SEQ ID 9880> 
45 was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2759> which encodes the amino acid 
sequence <SEQ ID 2760>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

50 

Final Results 

bacterial cytoplasm Certainty=0 . 1958 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

55 

An alignment of the GAS and GBS proteins is shown below. 
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-1003- 

Identities = 247/301 (82%) , Positives = 274/301 (90%) 



Query: 61 GMEFLSYMQILEQTALLEERYKGDNTSRELFSVESQHYAFWNAFVALFNGTDMTQYEL 120 

G+EFLSYARQI +EQT+LLE+RYK NT RELFSVSSQHYAFWNAFV+L TDMT+YEL 
Sbjct: 61 GVEFLSYARQIIEQTSLLEDRYICNHNTGRELFSVSSQHYAFVWAFVSLLKRTDMTRYEL 120 

Query: 121 FLRETRTWE1IDDVKNFRSEIGVLFLNSYNRDVLTKLFDDNSLIATTLFTTTPHIFVSKS 180 

FLRETRTWEI IDDVKNFRSEIGVIiF+N YNRDVLTKLFDDN L A+ LF PHIFVSKS 
Sbjct: 121 FLRETRTWEIIDDVKNFR3EIGVLFINDYNRDVLTKLFDDNHLTASPLFKAQPHIFVSKS 180 

Query: 181 NPLANRKKLNMKDLEDYPYLSYDQGLHNSFYFSEE^ISQIPHPKSIWSDRATLFNLMIG 240 

NPLA + L+N DL D+PYLSYDQG+HNSFYFSEEMMSQ+PH KSIWSDRATLFNLMIG 
Sbjct: 181 NPLATKSLLSMDDLRDFPYLSYDQGIHNSFYFSEEMMSQMPHNKSIWSDRATLFNLMIG 240 

Query: 241 LDGYTVATG1LNSKLNGDEIVAIPLDVDDVIDIVYIRHDKANLSKMGQKFIDYLLEEVSFN 301 

LDGYTVA+GILNS LNGD+IVAIPLDV D 1DIV+I+H+KANLSKMG++FI+YLLEEV+F+ 
Sbjct: 241 LDGYTVASG1LNSNLNGDQIVAIPLDVPDE1DIVFIKHEKANLSKMGERFIEYLLEEVTFD 301 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 905 

A DNA sequence (GBSx0960) was identified in S.agalactiae <SEQ ID 2761> which encodes the amino 
acid sequence <SEQ ID 2762>. This protein is predicted to be CpsX protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 32 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14 . 91 Transmembrane 22 - 38 { 13 - 42) 
INTEGRAL Likelihood =-14.65 Transmembrane 52 - 68 ( 44 - 77) 
INTEGRAL Likelihood = -6.74 Transmembrane 76 - 92 ( 73 - 97) 



35 Final Results 

bacterial membrane Certainty=0 . 6965 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:AAC44935 GB:U56901 putative transcriptional regulator [Bacillus subtilis 
Identities = 120/389 (30%), Positives = 196/389 (49%), Gaps = 17/389 (4%) 



+G SDS+I+VT++PK K M S+ RDT 





2 


Sbjct: 


19 




61 


Sbjct: 


78 


Query: 




Sbjct: 


131 


Query: 


181 


Sbjct: 


191 


Query: 


241 



VEAKlNAAYAAGGAQ^IMOTQDLliNITIDk^'QINMCGLIDLWA 180 

+ K+NAAY+ GG + TV++ L I ID YV ++ G D++N VGGI V FDF 
SKTKINAAYSKGGKDETVETvENFLQIPIDKYVTVDFDGFKDVINEVGGIDVDVPFDFDE 190 



+NGE+AL YARMR D GD+GR RQ++++ ++ ++ •* 



S N++TNI 1+ 



WO 02/34771 



-1004- 



PCT/GB01/04789 



SbjCt: 250 SNIAKIDKIAEKASENVETNIRITEGLALQQIY3GFTSKKIDTLSITGSDLYLGPNNTYY 309 

Query: 301 IVTSOTILLEIQmiRTELGLHK^QLKTOATWENLYGSTKSQTVNNNYDSSGQAPSYSD 360 
BE ++R L H ++ +T T S + + + S+G + 

5 Sbjot: 310 FEPDATNLE KVRKTLQEH-LDYTPDTSTGTSGTEDGTDSSSSSGSTGSTGTTTDGTT 365 

Query: 361 SHSSYANYSSGVDTGQSASTDQDSTASSH 389 

+ SSY+N SS T + ST +T SS+ 
Sbjct: 366 NGSSYSNDSS TSSNNSTTNSTTDSSY 391 

10 

There is also homology to SEQ ID 2764. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 906 

15 A DNA sequence (GBSx0961) was identified in S.agalactiae <SEQ ID 2765> which encodes the amino 
acid sequence <SEQ ID 2766>. This protein is predicted to be CpsIaB. Analysis of this protein sequence 
reveals the following: 
Possible site: 41 

>>> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -0.75 Transmembrane 121 - 137 ( 121 - 137) 

Final Results 

bacterial membrane --- Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9877> which encodes amino acid sequence <SEQ ID 9878> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 
30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 907 

A DNA sequence (GBSx0962) was identified in S.agalactiae <SEQ ID 2767> which encodes the amino 
acid sequence <SEQ ID 2768>. This protein is predicted to be cpsb protein. Analysis of this protein 
35 sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.02 Transmembrane 182 - 198 ( 179 - 204) 
INTEGRAL Likelihood = -5.57 Transmembrane 30 - 46 ( 24 - 48) 

40 

Final Results 

bacterial membrane Certainty=0 .4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 10785> and protein <SEQ ID 10786> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 



WO 02/34771 



PCT/GB01/04789 



McG: Discrim Score: -8.95 
GvH: Signal Score (-7.5): 0.11 

Poesible site: 35 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -9.02 threshold: C 
INTEGRAL Likelihood = -9.02 Transmembrane 
INTEGRAL Likelihood = - 
PERIPHERAL Likelihood = 6.21 
modified ALOM score: ' 2.30 

*** Reasoning Step: 3 



Transmembrane 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4609 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 908 

A DNA sequence (GBSx0963) was identified in S.agalactiae <SEQ ID 2769> which encodes the amino 
acid sequence <SEQ ID 2770>. This protein is predicted to be CpsIaD. Analysis of this protein sequence 
reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 149 - 165 { 149 - 166) 

Final Results 

bacterial membrane Certainty=0 . 1977 (Affirmative) < suco 

bacterial outside — Certainty»0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 909 

A DNA sequence (GBSx0964) was identified in S.agalactiae <SEQ ID 2771> which encodes the amino 
acid sequence <SEQ ID 2772>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have an uncleavable N-term signal seq 
Likelihood =-12. 
Likelihood = -4 
Likelihood = -4 
Likelihood = -3 
Likelihood = -3 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 5904 (Affirmative) < succ; 

• Certainty=0 . 0000 (Not Clear) < suco 

• Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8687> and protein <SEQ ID 8688> were also identified. Analysis of this 
protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-1006- 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 5.69 
GvH: Signal Score (-7.5): -5.63 
Possible site: 25 



10 



>» Seems to have an uncleavable N 


-term signal seq 










ALOM program 


count: 5 value: -12 


26 threshold: 


D.O 








INTEGRAL 


Likelihood =-12.26 


Transmembrane 


276 


292 


270 


297 


INTEGRAL 


Likelihood = -4.62 


Transmembrane 


10 




9 


28 


INTEGRAL 


Likelihood = -4.14 


Transmembrane 


41 


57 


39 


58 


INTEGRAL 


Likelihood = -3.24 


Transmembrane 


100 




100 


116 


INTEGRAL 


Likelihood = -3.08 


Transmembrane 


445 


4S1 


443 


451 


PERIPHERAL 


Likelihood - 2.23 


221 










modified ALOM 


score : 2.95 













15 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 910 

25 A DNA sequence (GBSx0965) was identified in S.agalactiae <SEQ ID 2773> which encodes the amino 
acid sequence <SEQ ID 2774>. This protein is predicted to be CpsF. Analysis of this protein sequence 
reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 
30 INTEGRAL Likelihood = -2.60 Transmembrane 79 - 95 ( 78 - 95) 

Final Results 

bacterial membrane Certainty=0 . 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 911 

A DNA sequence (GBSx0966) was identified in S.agalactiae <SEQ ID 2775> which encodes the amino 
acid sequence <SEQ ID 2776>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 39 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4634 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 912 

A DNA sequence (GBSx0967) was identified in S.agalactiae <SEQ ID 2777> which encodes the amino 
acid sequence <SEQ ID 2778>. Analysis of this protein sequence reveals the following: 



Possible site: 23 

Seems to have an uncleavable N- 
INTEGRAL Likelihood =-12. 
INTEGRAL Likelihood =-10. 
INTEGRAL Likelihood = -8. 
INTEGRAL Likelihood = -6. 
INTEGRAL Likelihood = -6 
INTEGRAL Likelihood = -4 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -2 
INTEGRAL Likelihood = -2 
Likelihood = -1 
Likelihood = -0 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



term signal seg 



Transmembrane 
Transmembrane 
Transmembrane 



- 75 ( 54 - 

- 325 ( 307 - 

- 49 ( 28 - 

- 211 ( 187 - 

- 301 ( 283 - 

- 238 ( 221 - 

- 94 ( 77 - 

- 117 ( 99 - 



Transmembrane 
Transmembrane 
Transmembrane 



-- Certainty=0. 5989 (Affirmative) • 
■- Certainty=0. 0000 (Not Clear) < i 
•- Certainty=0. 0000 (Not Clear) < i 



25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB43614 GB:AJ239004 polysaccharide polymerase [Streptococcus pneumoniae] 
Identities = 74/309 (23%), Positives = 137/309 (43%), Gaps = 35/309 (11%) 

FERRKLV IIFLLPIATILNLFFVHKVTFILTLI FFLALKDI - - SLKKAFS 1 1 IGSRI 107 

FE+RK II ++ I T+L + ++ +F+ + 1 L++ II 

FEKRKYTLQFIISIILITTLLLYTSIQMQNYVYFTSWFMLIGTIHYDLRRVIKIIFIVS- 119 



Query: 


53 


Sbjct: 


61 




108 


Sbjct: 


120 




163 


Sbjct: 


179 


Query: 


223 


Sbjct: 


237 




275 


Sbjct: 


297 




325 


Sbj ct: 


353 



- -TSVLFDNSYSMLLSMYGWLTMFCMI IY YI YSKKII I IELQLLLFIMS II 324 

TS FD+ YS L+S G++ + +++ Y+ +K +1+ LL + M + 
3LTSFTFDSFYSFLMSNAGIIWLLILSVLFVKLQKYLDNKSLIL LLAWSMYAV 352 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 913 

A DNA sequence (GBSx0968) was identified in S.agalactiae <SEQ ID 2779> which encodes the amino 
acid sequence <SEQ ID 2780>. This protein is predicted to be cap8J. Analysis of this protein sequence 
reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3424 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB43613 GB:AJ239004 cap8J [Streptococcus pneumoniae] 
Identities = 94/237 (39%) , Positives = 135/237 (56%) , Gaps = 10/237 (4%) 

Query: 1 MIPKVIHYCWFGGNPLPDNLKKYIKTWREQCPDYEIIEVINEHNYDVSKNVFMREAYTKKN 60 

MIPK IHY WFGG+ PD + K I +W++ PDYEI+EWNE N+D+S + F + AY + 
Sbjct: 1 MIPKKIHYIWFGGSEKPDVVLKCINSWKKYMPDYEIVEWNEDNFDLSDSQFAKSAYESRK 60 

Query: 61 FAYVSDYARLDIIYTYGGFYLDTDVELLKSL-DPLRIHECFLAREISCDVNTGLIIGAVK 119 

+A+ SDYAR 1+ YGG Y DTDVELLK++ D + H F E +VN GL+ + 
Sbjct: 61 WAFASDYARFKILSKYGGIYFDTDVELLKTISDDILAHSSFTGFEYIGEVNPGLVYACMP 120 

Query: 120 GHHFLKSNMSIYDKS--DLTSLNKTCVEVTT^LINRGLKNKNIIQKIDDITIYPRNYFN 177 

K + Y+++ D+ h T + T4 L+ + N Q ID + IYP +YF 
Sbjct: 121 DDKIAKYMVQYYEQASFDIinrXi- VTVNTIITDYLLKNNFQKNNQFQIIDGLAIYPDDYFC 179 

Query: 178 E 

+ +V LT T SIHHY +WK+ 
Sbjct: 180 GYDQEVKEVR-LTERTISIHHYSATWKTR — 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 914 

A DNA sequence (GBSx0969) was identified in S.agalactiae <SEQ ID 2781> which encodes the amino 
acid sequence <SEQ ID 2782>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results -' 

bacterial cytoplasm Certainty=0 . 3897 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0.OO00 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA87700 GB:Z47767 WbcL [Yersinia enterocolitica] 
Identities = 60/207 (28%), Positives = 101/207 (47%), Gaps = 22/207 (10%) 

Query: 4 IFTPTFmGYRLSYLYDSLCNQTNKNFIWLITODGSHlDSTKEIVSNYIKENKVSIVYLYK 63 

+FTPTFNR + L Y S+ Q + WLIVDDGS D+T E+V ++ ENK++I Y+Y+ 
Sbjct: 6 VFTPTFNRAHVLKRCYLSILEQDRDDIEWLITODGSTDNTAEWDSFKIENKIjNIKYIYQ 65 



55 Query: 64 RNGGKHSAYNLAMRYMQPSDYHVCVDSDDWLLEDAV EI I FKDLESLTLSNRYVG 117 

N GK +A+N A+ +Y + +DSDD + ++ +F D E + + 



WO 02/34771 PCT/GB01/04789 
-1009- 

Sbjct: 66 DNSGKQARWNKAVENAS-GEYFIGLDSDDAFIAGSINKLLSMNAVFDDKEIIGIR A 120 

Query: 118 LWPRYSLNQGNDTOIiNPKILEVNIPDLKYKYHLKIETCIVINMAYLVDFEFPCFEGENFL 177 

+ +L N +L+ + + + D ++ ++ E L + +P G NF+ 

Sbjct: 121 ISVSSETLKPNNYYLSNEDKKSSWFD-EFSSGIRGERIDFFKTELLRKfLYPVASGlNFI 179 

Query: 178 SEEIMYIYLSKKGYFCPQNRKIYCFDY 204 

E Y ++K+ YCF Y 

Sbjct: 180 PEIWFYSTVAKE YCFYY 196 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens f 



Example 915 

A DNA sequence (GBSx0970) was identified in S.agalactiae <SEQ ID 2783> which encodes the amino 
acid sequence <SEQ ID 2784>. This protein is predicted to be eps7. Analysis of this protein sequence 
reveals the following: 
Possible site: 32 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.18 Transmembrane 190 - 206 ( 189 - 206) ' 

Final Results 

bacterial membrane --- Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59293 GB:AJ131984 putative galactosyl transferase 
[Streptococcus pneumoniae] 
Identities = 101/312 (32%), Positives = 172/312 (54%), Gaps = 4/312 (1%) 

LISIIVPVYNGEIYIGRCLDSILEQTYQNLEIIIIDDGSSDRTCDICEKYFLEDRRIKYF 52 
+IS+IVPVYN Y+ LDS+LEQTY++ E+I+++DGS+D +G+IC++Y I F 

MISVIVPVYMVADYLRFALDSLLEQTYKDFEVILVMDGSTDNSGEICDEYGKLYDNIHVF 60 

YQENRGQSVARNNGVLRCTGDWIAFLDSDDVYLPYSIEVMYNIQKATNADI VLT- -SIGN 12 0 
+++N G S ARN G+ + G++I FLDSDD + PY++E++ IQK + DIV T I 



Query: 
Sbjct: 


3 
1 




63 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 




Sbjct: 


180 


Query: 


241 


Sbjct: 


240 


Query: 


301 


Sbjct: 


299 



f Y+ ++ N +1 +K KV 



+ +LF+ S Y+ 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 916 

A DNA sequence (GBSx0971) was identified in S.agalactiae <SEQ ID 2785> which encodes the amino 
acid sequence <SEQ ID 2786>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

a uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2787> which encodes the amino acid 
sequence <SEQ ID 2788>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2065 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/111 (33%) , Positives = 61/111 (54%) , Gaps = 3/111 (2%) 

Query: 1 MDKVSIIIPVYWQSFLI^CIESVIAQ-TYSNLEIILVNDGSTDNSGDIC-DYYSEIDGR 58 

M KVSII YN ++++ ++S L+Q T +EII+++D STD+S +1 Y + G+ 
Sbjct: 1 MY KVS 1 1 CTNYNKAPWI SDALDSFLSQVTDFEVEI IVIDDASTDDSRE I LKS YQKKSSGK 60 

Query: 59 I-FVFHKNNGGLSDARNYGISRATGDYIYLLDSDDYLYKEDAIERMVEFSE 108 

I +F++ N G++ A G YI D DDY +++ V+ E 

Sbjct: 61 IKLLFNETNIGITKTWXKACLYAKGKYIARCDGDDYWTDSFKLQKQVDVLE 111 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 917 

A DNA sequence (GBSx0972) was identified in S.agalactiae <SEQ ID 2789> which encodes the amino 
acid sequence <SEQ ID 2790>. This protein is predicted to be CpsK. Analysis of this protein sequence 
reveals the following: 

an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 918 

A DNA sequence (GBSx0973) was identified in S.agalactiae <SEQ ID 279 1> which encodes the amino 
acid sequence <SEQ ID 2792>. Analysis of this protein sequence reveals the following: 



10 



? N-terminal signal sequence 



• Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



- Certainty=0. 1956 (Affirmative) 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < : 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 919 

A DNA sequence (GBSx0974) was identified in S.agalactiae <SEQ ID 2793> which encodes the amino 
acid sequence <SEQ ID 2794>. This protein is predicted to be capsular polysaccharide. Analysis of this 
protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -8 
Likelihood = -7 
Likelihood = -6 
Likelihood = -4 
Likelihood = -3 
Likelihood = -3 
Likelihood = -2 
Likelihood = -1 
Likelihood = -1 
Likelihood = -1 
Likelihood = -0 



N-term signal seq 
Transmembrane 
Transmembrane 4 
Transmembrane 3 

I Transmembrane 1 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



21 



- 105 ( 80 - 

439 - 455 ( 428 - 

322 - 338 ( 317 - 

175 - 191 ( 174 - 

146 - 162 ( 145 - 

381 - 397 ( 375 - 

413 - 429 ( 412 - 

206 - 222 ( 205 - 222: 

354 - 370 ( 354 - 372] 



252 - 268 i 



- Final Results 

bacterial membrane --- Certainty=0 .4524 (Affirmative 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) . 



No corresponding DNA sequence was identified in S 
40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 920 

A DNA sequence (GBSx0975) was identified in S.agalactiae <SEQ ID 2795> which encodes the amino 
acid sequence <SEQ ID 2796>. This protein is predicted to be NeuB. Analysis of this protein sequence 
reveals the following: 



3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 2992 (Affirmative) < succ^. 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 PCT/GB01/04789 
-1012- 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 921 

A DNA sequence (GBSx0976) was identified in S.agalactiae <SEQ ID 2797> which encodes the amino 

acid sequence <SEQ ID 2798>. This protein is predicted to be NeuC. Analysis of this protein sequence 

reveals the following: 

10 Possible site: 41 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3150 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 922 

A DNA sequence (GBSx0977) was identified in S.agalactiae <SEQ ID 2799> which encodes the amino 
acid sequence <SEQ ID 2800>. This protein is predicted to be neuD. Analysis of this protein sequence 
reveals the following: 

25 Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is homology to SEQ ID 542. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 923 

A DNA sequence (GBSx0979) was identified in S.agalactiae <SEQ ID 280 1> which encodes the amino 
acid sequence <SEQ ID 2802>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2576 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 924 

A DNA sequence (GBSx0980) was identified in S.agalactiae <SEQ ID 2803> which encodes the amino 
acid sequence <SEQ ID 2804>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1621 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9875> which encodes amino acid sequence <SEQ ID 9876> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2805> which encodes the amino acid 
sequence <SEQ ID 2806>. Analysis of this protein sequence reveals the following: 



I-terminal signal i 



Final Results 

bacterial cytoplasm Certainty=0 . 1066 (Affirmative) . 

bacterial membrane Certainty=0 .0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 83/139 (59%) , Positives = 111/139 (79%) 



Sbj 

Sbj Ct ; 

Query: 
Sbj ct : 



6 TETHDHQALIQKLLVSIHYLTLFRDEIILVEKTPSLLGKHFSIAIVQNELGEILSKIEAL 65 

TE + HQ LIQKLLVS I HYLTLFRDE+ LVE+TPS+LG F +VQ+ELG+I++ 1+ L 
4 TEQNSHQILIQKLLVSIHYLTLFRDELKLVERTPSILGGEFPAHLVQSELGDIVAA.IDTL 63 

66 SKQKKIiIRSIYWYDESSFKVMNKMAIVEEWIKGLDNLLEFCQSQTVFQAILGDERAHVF 125 

Q++L1 S +WY+ES4-FK+MNK L IV+ WIKG+D+L4+ CQS+ VFQ I+GD+R VF 
64 DMQQRLIESTFWYEESAFKLMNKTLDIVDNWIKGVDELIDLCQSKEVFQIIIGDKRIRVF 123 

126 GILIDVYTSLNI INTSLKE 144 

G+L DV++SL + SLKE 
124 GVLSDVFSSLKVSALSLKE 142 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 925 

A DNA sequence (GBSx0981) was identified in S.agalactiae <SEQ ID 2807> which encodes the amino 
acid sequence <SEQ ID 2808>. This protein is predicted to be uracil-DNA glycosylase (ung). Analysis of 
this protein sequence reveals the following: 
Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3427 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



-1014- 



PCT/GB01/04789 



A related DNA sequence was identified in S.pyogems <SEQ ID 2809> which encodes the amino acid 
sequence <SEQ ID 2810>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 .4200 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside Certainty=0 . 0000 (Not Clear) . 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 160/216 (74%) , Positives = 185/216 (85%) 

Query: 1 MKHSSTODLIKRELPNHyVNKINTFMDAVYSSGIVYPPRDKVFNAIQITPLENVKWIIG 60 

M HS WH+ IK LP HYY +IN F+D Y SG+VYPPR+ VF A+Q+TPLE KV+I+G 
Sbjct: 1 MAHSIWHEKIKSFLPEHYYGRINHFLDEAYASGLVYPPRENVFKALQVTPLEETKVLILG 60 

Query: 61 QDPYHGPQQAQGLSFSVPDNLPAPPSLQNILKELAEDIGSRSHHDLTSWAQQGVLLLNAC 120 

QDPYHGP+QAQGLSFSVP+ + APPSL NILKELA+DIG R HHDL++WA QGVLLLNAC 
Sbjct: 61 QDPYHGPKQAQGLSFSVPEEISAPPSLINILKELADDIGPRDHHDIjSTWASQGVLLIjNAC 120 

Query: 121 LTVPEHQANGHAGLIVffiPFTDAVIKVVNQKETPVVFILWGGYARKKKSLIDNPIHHIIES 180 

LTVP QANGHAGLIWEPFTnAVIKV+N+K++PWFILWG YARKKK+ I NP HHIIES 
Sbjct: 121 LTVPAGQANGHAGLIWEPFTDAVIKVLNEKDSPVVFILWGAYARKKKAFITNPKHHIIES 180 

Query: 181 PHPSPLSAYRGFFGSRPFSRTNHFLEEEGINEIDWL 216 

PHPSPLS+YRGFFGS+PFSRTN LE+EG+ +DWL 
Sbjct: 181 PHPSPLSSYRGFFGSKPFSRTNAILEKEGMTGVDWL 216 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 926 

A DNA sequence (GBSx0982) was identified in S.agalactiae <SEQ ID 281 1> which encodes the amino 
acid sequence <SEQ ID 2812>. Analysis of this protein sequence reveals tl 



Possible site: 20 

>>> Seems to have an uncleavable I 

INTEGRAL Likelihood =-11.15 

INTEGRAL Likelihood = -8.92 

INTEGRAL Likelihood = -6.16 

INTEGRAL Likelihood = -4.67 

INTEGRAL Likelihood = -3.38 

INTEGRAL Likelihood = -1.06 

INTEGRAL Likelihood = -0.90 



-term signal seq 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



106 - 122 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0. 5458 (Affirmative) • 
•- Certainty=0. 0000 (Not Clear) < i 
■- Certainty=0 . 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9873> which encodes amino acid sequence <SEQ ID 9874> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MNI I IMI 1 1AYLLGSIQTGLW I GKYF YQVNLRQKGSGNTGTTNTFRILGVKAGI VTLTID 60 
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Query: 61 ILKGTLATLIPIILGITOTSPFFIGFFAIIGHTFPIFAQFKGGKAVATSAGVLLGFAPSF 120 

KGTLATL+PII + VSP G A+IGHTFPIFA FKGGKAVATSAGV+ GFAP F 
Sbjct: 61 FFKGTLATLLP1IFHLQGVSPLIFGLLAVIGHTFPIFAGFKGGKAVATSAGVIFGFAP1F 120 

Query: 121 FLYLLVT FLLTLYLFSMI SLSS ITvAVVGILSVIiIFPLVGF I LTDYDWI FTTWILMALT 180 

LYL +IF LYL SMISLSS+T ++ ++ VL+FPL GFIL++YD++F +++ +A 
Sbjct: 121 CLYIAIIFFGALYLGSMISLSSVTASIAAVIGVLLFPLFGFILSNYDFLFIAIILALASL 180 

Query: 181 IIIRHQDNIPCRIRKRQENLVPFGLNLSKQKNK 212 

IIIRH+DNI RI+ + ENLVP+GLNL+ Q K 
Sbjct: 181 1 1 IRHKDNIARI KNKTENLVPWGLNLTHQDPK 212 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2813> which encodes the amino acid 

sequence <SEQ ID 2814>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

3 N-terminal signal sequence 



INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -9 

INTEGRAL Likelihood = -7 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -4 



Transmembrane 194 - 210 ( 191 - 216) 

Transmembrane 146 - 162 ( 132 - 191) 

Transmembrane 165 - 181 ( 1S3 - 191) 

Transmembrane 23 - 39 ( 19 - 47) 

Transmembrane 95 - 111 ( 91 - 118) 



urane --- Certainty=0 . 5331 (Affirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA91S49 GB:Z67739 unidentified [Streptococcus pneumoniae] 
Identities = 138/213 (64%), Positives = 166/213 (77%) 

Query: 28 MKLLLFITIAYLLGSIPTGLWIGQYFYHINLREHGSGNTGTTNTFRILGVKAGTATLAID 87 

M ++ + +AYLLGSIP+GLWIGQ F+ INLREHGSGNTGTTNTFRILG KAG AT ID 
Sbjct: 1 MITIVLLIIAYLLGSIPSGLWIGQVFFQINLREHGSGNTGTTNTFRILGKKAGMATFVID 60 

Query: 88 MFKGTLS I LLP 1 1 FGMTS I SS IAIGFFAVLGHTFPI FANFKGGKAVATSAGVLLGFAPLY 147 

FKGTL4 LLPIIF + 4S + G AV+GKTFPI FA FKGGKAVATSAGV+ GFAP++ 
Sbjct: 61 FFKGTLATLLPIIFHLQGVSPLIFGLLAVIGHTFPIFAGFKGGKAVATSAGVIFGFAPIF 120 

Query: 148 LFFLASIFVLVLYLFSMISLASWSAIVGVLSVLTFPAIHFLLPNYDYFLTFIVILLAFI 207 

+LA IF LYL SMISL+SV ++I V4 VL FP F+L NYD+ I++ LA + 
Sbjct: 121 CLYI1SIIFFGALYLGSMISLSSVTASIAAVIGVI1LFPLFGFILSNYDFLFIAIILAI1ASL 180 

Query: 208 IIIRHKDNISRIKHHTENLIPWGLNLSKQVPKK 240 

IIIRHKDNI+RIK+ TENL+PWGLNL+ Q PKK 
Sbjct: 181 IIIRHKDNIARIKNKTENLVPWGLNLTHQDPKK 213 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/212 (67%) , Positives = 174/212 (81%) 

Query: 1 MNI I IMI I IAYLLGSIQTGLWIGKYFYQvNLRQHGSGNTGTTNTFRILGVKAGIVTLTID 60 

M +++ I IAYLLGSI TGLWIG+YFY +NLR+HGSGNTGTTNTFRILGVKAG TL ID 
Sbjct: 28 MKLLLFITIAYLLGSIPTGLWIGQYFYH1NLREHGSGNTGTTNTFRILGVKAGTATLAID 87 

Query: 61 ILKGTLATLIPIIIuGITTVSPFFIGFFAIIGHTFPIFAQFKGGKAVATSAGVLLGFAPSF 120 

+ KGTL+ L+PII G+T++S IGFFA++GHTFPIFA FKGGKAVATSAGVLLGFAP + 
Sbjct: 88 MFKGTLS I LLPI IFGMTS I SS IAIGFFAVLGHTFPI FANFKGGKAVATSAGVIjLGFAPLY 147 

Query: 121 FLYLLVIFLLTLYLFSMISLSSITVAVVGILSVLIFPLVGFILTDYDWIFTTVVILMALT 180 

+L IF+L LYLFSMISL+S+ A+VG+LSVL FP + F+L +YD+ T +VIL+A 
Sbjct: 148 LFFLASIFVLVLYLFSMISLASVVSAIVGVLSVLTFPAIHFLLPNYDYFLTFIVILLAFI 207 
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Query: 181 IIIRHQDNIKRIRKRQENLVPFGLNLSKQKNK 212 

IIIRH+DNI RI+ ENL+P+GDNLSKQ K 
Sbjct: 208 IIIRHKDNISRIKHHTENLIPWGIiNLSKQVPK 239 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 927 

A DNA sequence (GBSx0983) was identified in S.agalactiae <SEQ ID 2815> which encodes the amino 
acid sequence <SEQ ID 2816>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 3ucc> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens 1 



or diagnostics. 
Example 928 

A DNA sequence (GBSx0984) was identified in S.agalactiae <SEQ ID 2817> which encodes the amino 
acid sequence <SEQ ID 2818>. Analysis of this protein sequence reveals the following: 

f-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1585 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9871> which encodes amino acid sequence <SEQ ID 9872> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91550 GB:267739 DNA topoisomerase IV [Streptococcus pneumoniae] (ver 2) 
Identities = 574/649 (88%), Positives = 617/649 (94%), Gaps = 2/649 (0%) 

Query: 5 LAKQDITVTNYGDDAIQVLEGLDATOKRPGMYIGSTDGTGLHHLVWEIVDNAVDEALSGF 64 
++K++I + NY DDftlQVLEGLDAVRICRPGMYIGSTDG GLHHLVWEIVDNAVDEAIiSGF 

sbjct: i mskkeininm™3Daiqvlegldaw:<kpc-myigstdgaglhhlvweivdnavdealsgf 60 

Query: 65 GNRIDVIINKDGSITVTDHGRGMPTGMHBMGKPTVEVIFTVLffi 124 

G+RIDV INKDGS+TV DHGRGMPTGMHAMG PTVEVTFT+LHAGGKFGQGGYKTSGGLH 
Sbjct: 61 GDRIDVTINKDGSLWQDHGRGMPTGMHftMGIPTvEVIFTILHAGGKFGCGGYKrSGGLH 120 

Query: 125 GVGSSWNALSSWLEVEIIRDGAIYRQRFENGGKPVTTLKKIGTAPKSKSGTSVSFMPDQ 184 

GVGSSWNALSSWLEVEI RDGA+Y+QRFENGGKPvTTLKKIGTAPKSK+GT V+FMPD 
Sbjct: 121 GVGSSVVNALSSWLEVEITRDGAVYKQRFENGGKPVTTIjKKIGTAPKSKTGTKvTFMPDA 180 

Query: 185 SVFSTIDFKFNTIAERLKESAFLLKNVTlTLTDmSEEKEHLEFHYENGVQDFVEYLNED 244 
++FST DFK+NTI+ERL ESAFLLKNVTL+LTD R++EA +EFHYENGVQDFV YUJED 



Sb j Ct : 


181 


Query: 


245 


Sbjct: 


239 


Query: 


305 


Sbjct: 


299 


Query: 


365 


Sbjct: 


359 


Query: 


425 


'Sbjct: 


419 


Query: 


485 


Sbjct- 


479 


Query: 


545 


Sbjct: 


539 




605 


Sbjct: 


599 
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181 TIFSTTDFKYNTISERLNESAFLLKKVTLSLTDKRTDE?.- - IEFHYENGVQDFVSYLNED 238 



KE LTP+++FEGE+ F +EVMjQYNDGFSDNILSFVKN\ r RTKDGGTHETGLKSAITK M 



HDYARKTGLLKEroiCNLEGSDyREGL+A+LSILVPEEHLQFEGQTKDKLGSPLARP+VDG 



1V++KLT+FLMENG+LASNLIRKAIKARDAREAARKARDESRNGKK+KKDKGJjLSGKLTP 



AQSKN KNELYLVEGDSAGGSAKQGRDRKFQAI LPLRGKV+NTAKAKMADI +KNEEINT 



MI+TIGAGVG DF+++D NYDKIIIMTDADTDGAHIQTLLLTFFYRYMRPLVE G 



LPPLYKMSKGKGKKE V YAWTD ELEELR++FGKG+ LQRYKGLGEMNADQLWETTMNP 



ETRTLIRVTIEDLARAERRVNVLMGDKV PRR+WIEDNVKFTLEE TVF 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2819> which encodes the amino acid 
sequence <SEQ ID 2820>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1518 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) <; suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 560/649 (86%) , Positives = 615/649 (94%) 

Query: 5 ]^QDITVT^GDDAIQVLEGLDAVRKRPGMYIGSTDGTGLHHLVWEIVDHAVDEALSGF 64 

L K++IT+ NY DDAIQVLEGLDAVRKRPGMYIGSTD TGLHHL+WE I VDNAVDEALSGF 
Sbjct: 2 LTKKEITINNYNDDAIQVLEGLDAVRKRPGMYIGSTDATGLHHLIWEIVDNAVDEALSGF 61 

Query: 65 GWRIDVIINKDGSITVTDHGRGMPTGMHAMGKPTVEVIFTVLHAGGKFGQGGYKTSGGLH 124 

G+ I V+INKDGS++V D GRGMPTG HAMG PTV+VIFT+LHAGGKFGQGGYKTSGGLH 
Sbjct: 62 GDD IKWINKDGSVSVADSGRGMPTGQHAMGIPTVQVI FTI LHAGGKFGQGGYKTSGGLH 121 

Query: 125 GVGSSWNALSSWLEVEIIRDGAIYRQRFENGGKPVTTLKKIGTAPKSKSGTSVSFMPDQ 184 

GVGSSWNALS+WLEVEI RDG++YRQRFENGGKPVTTLKK+GTAPKSKSGT V+FMPD 
Sbjct: 122 GVGSSVWALSAWLEVEITRDGSVYRQRFENGGKPOTTLKKVGTAPKSKSGTVVTFMPDD 181 

Query: 185 SVFSTIDFKFNTIAERLKESAFLLKNVTuTLTDNRSEEAEHLEFHYENGVQDFVEYLNED 244 

+FSTIDFKFNTI+ERLKESAFLLKNV ++LTD R ++ EFHYENGVQDFVEYLNED 
Sbjct: 182 KIFSTIDFKFNTISERLKESAFLLKNVKMSDTDLRGDDPIIEEFHYENGVQDFVEYIJSIED 241 

Query: 245 KETLTPIMFFEGEEQEFHIEVALQYNDGFSDNILSFVNNVRTKDGGTHETGLKSAITKSM 304 

KETLTP+++ EG++Q+F +EVALQYNDGF8DNILSF\'NNVRTKDGG+HETGLKSAITK4M 
Sbjct: 242 KETLTPVIYMEGQDQDFQVEVALQYNTX5FSDNILSFVNNVRTKDGGSHETGLKSAITKAM 301 



Query: 

65 



305 NDYARKTGLLKEKDKNLEGSDYREGLSAILSILVPEEHLQFEGQTKDKLGSPLARPIVDG 364 
NDYARKT LLKEKDKNLEGSDYREGLSA+LSILVPE+HLQFEGQTKDKLGSPLARPIV+ 
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Sbjct: 302 ITOYARKTOLLKEKDKNLEGSDYREGLSAVLSILVPEQHLQPEGQTKDKLGSPLARPIVES 361 

Query: 365 IVSEKLTYFLMENGDlASNLIRIQ^IKARDAREilWlKARDESRNGKKSKKDKGLLSGKLTP 424 

IVSEKLT+FL+ENG++AS+L+RKAlKARnAREAARKARD+SRHGKK+KKDKGLLSGKLTP 
Sbjct: 362 IVSEKLTFFLLENGEVASHLTOKAIKAMD2VREftARKARDDSRNGKKNKKDKGLLSGKLTP 421 

Query: 425 AQSKNAKKNELYLVEGDSAGGSAKQGRDRKFQAILPLRGKVIJ^ITAKAK^KyDIIKlffiEIOT 484 

AQSKNAKKNELYLVEGDSAGGSAKQGRDRKFQAILPLRGKVLNT KAKMADI+KNEEINT 
Sbjct: 422 AQSI<NAI0^LYLTOGDSAGGSAKQGRDRKFQAILPLRGI?/1OTEKAKMADILKNEEINT 481 

Query: 485 MIHTIGAGVGPDFNLDDINYDKIIIMTDADTDGAHIQTLLLTFFYRYMRPLVEEGHVYIA 544 

M++TIGAGVG DFNL+DINYDKIIIMTDADTDGaHIQTLLIjTFFYRyMRPLVE GHVYIA 
Sbjot: 482 MVYTIGAGVGADFNLEDINYDKI I IMTDADTEGAHIQ^LLLTFFYRYMRPLVEAGHVYIA 541 

Query: 545 LPPIiYKMSKGKGKKEIVEYAVITDIELEELRQKFGKGSLLQRYKGLGEMNADQLWETTMNP 604 

LPPLYKMSKGKGK E + YAWTD ELE+LR++FGKG++LQRYKGLGEMNA+QLWETTM+P 
Sbjct: 542 LPPLYKMSKGKGKTEKIAYAWTDGELEDLRREFGKGAILQRYKGLGEMNANQLMETTMDP 601 

Query: 605 ETRTLIRVTIEDLARAERRVNVLMGDKVPPRRQWIEDNVKFTLEENTVF 653 

ETRTL1RVTI+DLARAERRV+VLMGDK PRRQWIEDNVKFTLEENTVF 
Sbjct: 602 ETRTLIRVTIDDLARAERRVSVLMGDKAAPRRQW1EDNVKFTLEENTVF 650 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 929 

A DNA sequence (GBSx0985) was identified in S.agalactiae <SEQ ID 2821 > which encodes the amino 
acid sequence <SEQ ID 2822>. Analysis of this protein sequence reveals the following: 
Possible site: 49 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 378 - 394 ( 378 - 394) 

Final Results 

bacterial membrane Oertainty=0 . 1319 (Affirmative) < succ> 

bacterial outside Certainty=0 . 0D00 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD34369 GB:AF129764 ParC [Streptococcus mitis] 
Identities - 640/820 (78%), Positives = 722/820 (88%), Gaps = 5/820 (0%) 



Query: 




Sbjct: 


■1 




61 


Sbj ct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 



ARLSEIAGYLLQDIDK TVPF+WNFDDTEKEPTVLPAAFPNLLVNG+TGISAGYATDIPP 



HNLAEVIDA VYMIDHP AK-DKLMEFLEC-PDFPTG IIQG+DEI+KAYETGKGRV VRS 



3 LKGGK+QI++TEIPYE+NK+ LVK+IDDVRVN+KV GIAEVRDESDRDGLRIAI 



ELKK+A+ +VLNYLFKYTDLQ+NYNFNMVAIDv+TP+QVG+ IL+SYIAHRRE+I+AR 



WO 02/34771 



PCT/GB01/04789 



-1019- 

3jct: 301 ELKOTANTELV^LFK^TDLQINYNBMMVAIDlOTPRQVGIVPILSSYIJfflRREVIIiAR 360 



TLQLYRLTNTD+V L+EEE ELR++I ML All DERTMYN+MK+ELREVKKKFA R S 



L++ A+ IEIDTASLI EEDTYVSVT+ GY+KRTSPRSF AST+ +E+GKR+DD LIFV 



Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 


Query: 


481 


Sbjct: 


481 


Query: 


541 


Sbjct: 


541 


Query: 


600 


Sbjct: 


601 


Query: 


660 


Sbjct - 


661 




720 


Sbjct: 


721 




780 


Sbjct: 


777 



TYFA T LGQIKR ER+E +PWRTYKSK+ KYAKLK D +V VAPI+L+DV+L++ 



NGYALRF+I +VPWG+KAAGVKAMNLK+ D + SAFI NT+S YLLT RGSLKR++ID 



IP TSRA RGLQVLRELK+KPHRVF AG V 



■ L + SERTSNGSF+SD ISDEEVF +K 
1SRLQDLNLSERTSNGSFI SDTI SDEEVFDAYLK 316 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2823> which encodes the amino acid 
sequence <SEQ ID 2824>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 376 - 392 ( 376 - 394) 



Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 633/819 (77%) , Positives = 719/819 (87%) 

MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKGFR 60 
MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKG+R 
MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKGYR 62 

IGSMDGDPAAAMRYTE 120 





1 


Sbjct: 


3 




61 


Sbjct: 


63 


Query: 


121 


Sbjct: 


123 




181 


Sbjct: 


183 



KSAKSVGN+MGNFHPHGDSSrYDAMVRMSQDWKNRE L+EMHGNNGSMDGDP AAMRYTE 



ARLSEI AGYLLQD I +KNTV FAWNFDDTEKEPTVLPAAFPNLLVKG++GISAGYATDIPP 



HNL+EVIDAWYMIDHPKA L+KLMEFLPGPDFPTG IIQG DE I + KAYETGKGRV VRS 



65 Query: 



241 RTAIETLKGGZKQIIVTEIPYEVNKSVLVKRIDDVRVNNKVPGIAEVRDESDRDGLRIAI 300 
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RT IE LKGGK+QIIVTEIPYEVNK+VLVK+IDDVRVNNKVPGI EVRDESDR GLRIAI 
Sbjct: 243 RTEIEELKGGKQQIIVTEIPYEVNKAVLVKKIDDVRVNNKVPGIVEVRDESDRTGLRIAI 302 

Query: 301 ELKKEADETIVHWTiFKYTDLQVNYNFNWAIDDYTPKQVG I IAR 360 

5 ELKKEAD +IMYL KYTDLQVNYNFNMVAID +TP+QVGL +IL+SYI+HR++III R 

Sbjct: 303 ELKKEADSQTIEOTLLKYTDLQVNYNFNMVAIDHFTPRQVGLQKILSSYISHRKDIIIER 362 

Query: 361 SKFDKEKAEKRLHIVEGLIRVLSILDEVIALIRASENKADAKENLKVSYEFSEAQAEAIV 420 
SKFDK KAEKRLHIVEGLIRVLSILDE+IALIR+S+NKADAKENLKVSY+FSE QAEAIV 
10 Sbjct: 353 SKFDKAKAEKRLHIVEGLIRVLSILDEIIALIRSSDNKADAKENLKVSYDFSEEQAEAIV 422 

Query: 421 TLQLYRLTNTDIVTLREEEEELRQQITMbKAIISDERTMYNVMKRELREVKKKFAHTRRS 480 

TLQLYRLTNTD I VTL + EE +LR IT L All DE TMYNVMKRELREVKKKFAN R S 
Sbjct: 423 TLQLYRLTNTDIVTLQlffiENDLRDLITTLSAIIGDEATMYNVI'IKRELREVKKKFANPRIiS 482 

15 

Query: 481 ELQELAETIEIDTASLIIEEDTYVSVTRGGYVKRTSPRSFNASTVDELGKREDDELIFVS 540 

ELQ ++ IEIDTASLI EE+T+VSVTRGGY+KRTSPRSFNAS+++E+GKR+DDELIFV 
Sbjct: 483 ELQAESQIIEIDTASLIAEEETFVSVTRGGYLKRTSPRSFNASSLEEVGKRDDDELIFVK 542 

20 Query: 541 NAKTTQHLLMFTOTjGNLAYRPVHELADIRWKDVGEHLSQNLVNFASNEEIIYAELVDDFT 600 

AKTT+HLL+ FT LGN+ YRP+HEL D+RWKD+GEHLSQ + NFA+ EEI+YA++V F 
Sbjct: 543 QAKTTEHLLLFTTLGNVIYRPIHELTDLRWKDIGEHLSQTISNFATEEEILYADIVTSFD 602 

Query: 601 KETYFAVTSLGQIKRFERQEISPWRTYKSKTAKYAKLKSVEDYVVTVAPIQLEDVILVTY 660 
25 + Y AVT G IKRF+R+E+SPWRTYKSK+ KY KLK +D WT++P+ +ED++LVT 

Sbjct: 603 QGLYVAVTQNGFIKRFDRKELSPWRTYKSKSTKYVKLKDDKDRVVTLSPVIMEDLLLVTK 662 

Query: 661 NGYALRFSINDVPVVGSKAAGVKAMNBKDRDHIVSAFIANTTSLYBLTHRGSLKRMAIDV 720 
NGYALRFS +VP+ G K+AGVK +NLK+ D + SAF + S ++LT RGSLKRMA+D 
30 Sbjct: 663 NGYALRFSSQEVPIQGLKSAGVKGINLKNDDSLASAFAVTSNSFFVLTQRGSLKRMRVDD 722 

Query: 721 IPTTSRANRGLQVLRELKSKPHRVFKAGPVYLEDSSFEFDLFSSVSNHEGDTFVLEIMSK 780 

IP TSRANRGL VLRELK+KPHRVF AG V + S+ +FDLF+ + E + +LE++SK 
Sbjct: 723 IPQTSRANRGLLVLRELKTKPHRVFLAGGVQSDTSAEQFDLFTDIPEEETNQQMLEVISK 782 

35 

Query: 781 TGKVYDVDLSQWSFSERTSNGSFVSDKISDEEVFSVKIK 819 

TG+ Y++ L S SER SNGSF+SD ISD+EV + + 
Sbjct: 783 TGQTYEIALETLSLSERI SNGS FI SDT I SDQEVLVARTR 821 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 930 

A DNA sequence (GBSx0986) was identified in S.agalactiae <SEQ ID 2825> which encodes the amino 
acid sequence <SEQ ID 2826>. Analysis of this protein sequence reveals the following: 

45 Possible site: 4S 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3369 (Affirmative) < suco 
50 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF64593 GB:AF169649 branched- chain aminotransferase IlvE 
55 [Lactococcus lactis] 

Identities = 259/340 (76%) , Positives = 294/340 (86%) 

Query: 1 MIWLDWDNLGFAYRKLPFRYISHFKIX3KWDDGKLTDE1ATLHISESSPALHYGQQAFEGL 60 
M +NLDW+NLGF+YR LPFRYI+ FKDGKW G+LT D LHISESSPALHYGQQ FEGL 
60 Sbjct: 1 MAINLDWENLGFSYRNLPFRYIARFKDGKWSAGELTGDNQLHISESSPALHYGQQGFEGL 60 



Query: 61 KAYRTKM3SIQLFRPDQNAERLQRTADRLLMPHVPTDKFIAAVKSVVRANEEFVPPYGTG 120 
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Sbjct: 


61 


Query: 


121 


Sbj ct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 




301 


Sbjct: 


301 



-1021- 

KAYRTKDGS IQLFRPDQNA RLQ+TA RL M V T+ FI AVK W+AN++FVPPYGTG 



ATLY+RPLLIGVGD+IGVKPA+EYIF VFAMPVGSYFKGGL P+ F++S+EYDRAAP GT 



G AKVGGNYAASL A ++D IYLDP+THTKIEEVGAANFFGIT DN+FITPLS 



PSILPSITKYSLLYIA+ R G++AIEG+V+ +L KF EAGACGTAA+ISPIG I +G+D 



++F+SETEVGP ++LYDELVGIQFGDVEAPEGWI KVD 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2827> which encodes the amino acid 
sequence <SEQ ID 2828>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1208 (Affirmative) 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < ; 

bacterial outside Certainty=0.0000 (Not Clear) < , 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 280/340 (82%) , Positives = 308/340 (90%) 



Query: : 

MT+ +DWDNLGF Y KLPFRYIS++K+G+WD G+LT+DATLHISES+PALHYGQQAFEGL 
Sbjct: 16 MTIAIDWDNtGFEYHKLPFRYISYYKNGCWDKGQLTEDATLHISESAPALHYGQQAFEGL 75 

Query: 61 KAYRTKDGSIQLFRPDQNAERLQRTADRLLMPHVPTDKFIAAVKSWRANEEFVPPYGTG 120 

KAYRTKDGSIQLFRPD+NA RLQ TADRLLMP V T++FI A K W+ANE+FVPPYGTG 
Sbjct: 76 KAYRTKDGSIQLFRPDRNAVRLQATADRLLMPQVSTEQFIDAAKQWKANEDFVPPYGTG 135 

Query: 121 ATLYIRPLLIGVGDIIGVKPAEEYIFTVFAMPVGSYFKGGLTPTNFIVSKEYDRAAPNGT 180 

ATLY+RPLLIGVGDIIGVKPAEEYIFT+FAMPVG+YFKGGL PTNFIVS+ +DRAAP GT 
Sbjct: 136 ATIiYDRPLLIGVGDIIGVKPAEEYIFTIFAMPVGNYFKGGLAPTNFIVSEAFDRAAPYGT 195 

Query: 181 GAAKVGGNYAASLLPGKYAHEKQFSDVIYLDPATHTKIESVGAANFFGITKDNQFITPLS 240 

GAAKVGGNYA SLLPGK A FSDVIYLDPATHTKIEEVGAANFFGIT +N+F+TPLS 
Sbjct: 196 GAAKVGGNYAGSLLPGICAAKSAGFSDVIYLDPATHTKIEEVGAANFFGITANNEFVTPLS 255 

Query: 241 PSILPS1TKYSLLYIAKERFGMEAIEGDVFVDELDKFTEAGACGTAAVISPIGGIQNGDD 300 

PSILPS1TKYSLL LA+ER GM IEGDV ++ELDKF EAGACGTAAVISPIGGIQ D+ 
Sbjct: 256 PSILPSITKYSLLQLAEERLGMTVIEGDVPINELDKFVEAGACGTAAVISPIGGIQYKDN 315 

Query: 301 FHVFYSETEVGPATRKLYDELVGIQFGDVEAPEGWIYKVD 340 

HVFYSETEVGP TR+LYDELVGIQFGD+EAPEGWI KVD 
Sbjct: 316 LHVFYSETEVGPVTRRLYDELVGIQFGDIEAPEGWIVKVD 355 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 931 

A DNA sequence (GBSx0987) was identified in S.agalactiae <SEQ ID 2829> which encodes the amino 
acid sequence <SEQ ID 2830>. Analysis of this protein sequence reveals the following: 
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Possible site: 30 

:>» Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0. 3459 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9365> which encodes amino acid sequence <SEQ ID 9366> 
10 was also identified. A further related GBS nucleic acid sequence <SEQ ID 10915> which encodes amino 
acid sequence <SEQ ID 10916> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 283 1> which encodes the amino acid 
sequence <SEQ ID 2832>. Analysis of this protein sequence reveals the following: 

15 Possible site: 50 

»> Seems to have no W-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 . 3043 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 22/36 (61%) , Positives = 30/36 (83%) 

25 

Query: 4 IVSKKDKKIEIQISDAQVTVNGTKVDGYQLVMEKKL 39 

++SKKDKKIEIQ+ D +V VN TK+DGYQL + K++ 
Sbjct: 1 VMSKKDKKIEIQL1DHKVMVNETKIDGYQLQIGKRV 36 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 932 

A DNA sequence (GBSx0988) was identified in S.agalactiae <SEQ ID 2833> which encodes the amino 
acid sequence <SEQ ID 2834>. This protein is predicted to be glycyl-tRNA synthetase beta subunit (glyS). 
35 Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 .1617 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:CAB73488 GB:AL139077 glycyl-tRNA synthetase beta chain 

[Campylobacter jejuni] 
Identities = 33/90 (36%) , Positives = 49/90 (53%) , Gaps = 2/90 (2%) 

Query: 3 RAFNIAEKVTHSVLVDSSLFENNQEKALYQAILSLELTEDMHDNLDKLFALSPIINDFFD 62 
50 R N+A K H V D ShF E LY+A + + L+ LFAL P I++FF+ 

Sbjct: 570 RIANIATKNPHKV--DESLFVQEAESKLYKAFQEKTKANSLQEKLENLFALKPFIDEFFN 627 

Query: 63 NTMVMTDDEKMKQNRIiAILNSLVAKARTVA 92 
M+ +DEK+K NR A++ + A+ +A 
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Sbjct: 628 QVMINAEDEKLKNNRQALVYEIYAEFLKIA 657 

There is also homology to SEQ ID 2836. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 933 

A DNA sequence (GBSx0989) was identified in S.agalactiae <SEQ ID 2837> which encodes the amino 
acid sequence <SEQ ID 2838>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4825 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13672 GB:Z99113 ynzC [Bacillus subtilis] 
Identities = 41/72 (56%), Positives = 56/72 (76%) 

20 

Query: 5 KIARINELSKKKKTVGLTGEEKVEQAKLREEYIEGFRRSVRHHVEGIKLVDDEGNDVTPE 64 

KIARINEL+ K K +T EEK EQ KLR+EY++GFR S+++ ++ +K++D EGNDVTPE 
Sbjct: 6 KIARINELAAKAKAGVITEEEKAECjQKLRQEYLKGFRSSMKNTLKSVKIIDPEGNDVTPE 65 

25 Query: 65 KLRQVQREKGLH 76 

KL++ QR LH 
Sbjct: 66 KLKREQRNNKLH 77 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2839> which encodes the amino acid 
30 sequence <SEQ ID 2840>. Analysis of this protein sequence reveals the following: 
Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm — Certainty=0 . 4303 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) <: suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 79/85 (92%) , Positives - 83/85 (96%) 

Query: 1 MDPKKIARINELSKKKKTVGLTGEEKVEQAKLREEYIEGFRRSVRHHVEGIKIVDDEGND 60 

MDPKKIARINEL+KKKKTVGLTG EKVEQAKLREEYIEG+RRSVRHH+EGIKLVD+EGND 
Sbjct: 1 MDPKKIARINEIAKKKKTVGLTGPEKVEQAKLREEYIEGYRRSVRHHIEGIKLVDEEGND 60 

45 

Query: 61 VTPEKLRQVQREKGLHGRSLDDPNS 85 

VTPEKLROVQREKGLHGRSLDDP S 
Sbjct: 61 VTPEKLRQVQREKGLHGRSLDDPKS 85 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1024- 

Example 934 

A DNA sequence (GBSx0990) was identified in S.agalactiae <SEQ ID 2841> which encodes the amino 
acid sequence <SEQ ID 2842>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
5 >>> Seems to have no KF-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2343 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB69985 GB:U94355 glycerol kinase [Enterococcus casselif lavus] 
Identities = 381/496 (76%) , Positives = 439/496 (87%) 

15 

Query: 3 SEEKYIMAIDQGTTSSRAIIFNKKGEKIA8SQKEFPQIFPQAGWVEHNANQIWNSVQSVI 62 

+E+ Y+MAIDQGTTSSRAIIF++ G+KI SSQKEFPQ FP++GWVEHNAN+IWNSVQSVI 
Sbjct: 2 AEKNYVMAIDQGTTSSRAIIFDRNGKKIGSSQKEFPQYFPKSGWVEHNANEIWNSVQSVI 61 

20 Query: 63 AGAFIESSIKPGQIEAIGITNQRETTWWDKKTGLPIYNAIVWQSRQTAPIADQLKQEGH 122 

AGAFIES I+P I IGITNQRETTWWDK TG PI NAIVWQSRQ++PIADQLK +GH 
Sbjct: 62 AGAFIE3GIRPEAIAGIGITNQRETTVVWDKTTGQPIANAIVWQSRQSSPIADQLKVDGH 121 

Query: 123 TNMIHEKTGLVIDAYFSATKVRWILDHVPGIAQERAEKGELLFGTIDTWLVWKLTDGLVHV 182 
25 T MIHEKTGLVIDAYFSATKVRW+LD++ GAQE+A+ GELLFGTID+WLWKLTDG VHV 

Sbjct: 122 TEMIHEKTGLVIDAYFSATKVRWLLDNIEGAQEKADNGELLFGTIDSWLVWKLTDGQVHV 181 

Query: 183 TDYSNAARTMLYNIKELKJTODEILELI^IPKAMLPEVKBNSEWGKTTPFHFYGGEVPIS 242 
TDYSNA+RTMLYNI +L+WD EIL+LLNIP +MLPEVKSNSEVYG T +HFYG EVPI+ 
30 Sbjct: 182 TDYSNASRTMLYNIHKLEWDQEILDLLNIPSSMLPEVKSNSEVYGHTRSYHFYGSEVPIA 241 

Query: 243 GMAGDQQAALFGQIAFEPGMVKNTYGTGSFIIMNTGEEMQLSQNNLLTTIGYGINGKVHY 302 

GMAGDQQAALFGQ+AFE GM+KNTYGTG+FI+MNTGEE QLS N+LLTTIGYGINGKV+Y 
Sbjct: 242 GMAGDQQAALFGQMAFEKGMIKIsrrYGTGAFIVMlWGEEPQLSDNDLLTTIGYGINGKVYY 3 01 

35 

Query: 303 ALEGSIFIAGSAIQWLRDGLRMIETSSESEGLAQSSTSDDEVYWPAFTGLGAPYWDSNA 362 

ALEGSIF+AGSAIQWLRDGLRMIETS +SE LA + D+EVYWPAFTGLGAPYWDS A 
Sbjct: 302 ALEGSIWAGSAIQWLRDGLRMIETSPQSEEIAAKAKGDNEVYWPAFTGLGAPYWDSEA 361 

40 Query: 363 RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTMQVDSGIDIQQDRVDGGAAMNNLLMQ 422 

RG+VFGLTRGT+KEDFV+ATLQ++AYQ +DVIDTM+ DSGIDI L+VDGGAA N+LLMQ 
Sbjct: 362 RGAVFGLTRGTTKEDFvRATLQAVAYQSKDVIDTMKKDSGIDIPLLKOTGGAAKNDIjLMQ 421 

Query: 423 FQADILGIDIARAKNLETTALGAAFLAGLSVGYWESMDELKELNATGQLFQATMNESRKE 482 
45 FQADIL ID+ RA NLETTALGAA+LAGL+VG+W+ +DELK + GQ+F M ++ 

Sbjct: 422 FQADILDIDVQRAANLETTALGAAYLAGIAVGFWKDLDELKSMAEEGQMFTPEMPAEERD 481 

Query: 483 KLYKGWRKAVKATQVF 498 
LY+GW++AV ATQ F 
50 Sbjct: 482 NLYEGWKQAVAATQTF 497 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2843> which encodes the amino acid 
sequence <SEQ ID 2844>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
55 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2282 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

60 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
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Identities = 4S4/500 (92%) , Positives = 484/500 (96%) 

Query: 3 SEEKYIMAIDQGTTSSRAIIFNKKGEKIASSQKEFPQIFPQAGWVEHNflNQITOSVQSVI 62 

S+EKYIMAIDQGTTSSRAI I FN+KGEK+ +S SQKEFPQI PP AGWVEHNANQIWNSVQSVI 
Sbjct: 2 SQEKY-IMAIDQGTTSSRAIIFNQKGEK\ r SSSQKEFPQIFPHAGWVEHNANQIVINSVQSVI 61 

Query: 63 AGAFIESSIKPGQIFAIGITNQRETTVV'WDKKTGLPIYNAIIVWQSRQTAPIADQLKQEGH 122 

AGAFIESSIKP QIEAIGITNQRETTWWDKKTG+PIYNAIVWQSRQTAPIA+QLKQ+GH 
Sbjct: 62 AGAFIESSIKPSQIEAIGITNQRETTVVWDKKrGVPIYNAIVWQSRQTAPIAEQLKQDGH 121 

Query: 123 TNMIHEKTGLVIDAYFSATIOTRWILDHVPGAQERAEKGELLFGTIDTWLWKLTDGLVHV 182 

T MIHEKTGLVIDAYFSATK+RWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDG VHV 
Sbjct: 122 TKMIHEKTGLVIDAYFSATKIRWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDGAVHV 181 

Query: 183 TDYSNARRTMLYNIKELKWDDEILBLLNIPKAMLPEVKSNSEVYGKTTPFHFYGGEVPIS 242 

TDYSNAARTMLYNIK+L WDDEILELLNIPK MLPEVKSNSE+YC 
Sbjct: 182 TDYSNAARTMLYNIKDLTWDDEILELLNIPI 

Query: 243 G^^GDQQAALFGQIAFEPG^WKOTYGTGSFIIMNTGEEMQLSQNNLLTTIGYGINGKVHY 302 

GMAGDQQAALFGQLAFEPGMVKNTYGTGS FI IMNTG+EMQLS NNLLTTIGYGIMGKVHY 
Sbjct: 242 GMAGDMAALFGQIAFEPGMVKNTYGTGSFIIMNTGDEMQLSSNNLLTTIGYGINGKVHY 301 

Query: 3 03 ALEGSIFIAGSAIQWLRDGLRMIETSSESEGtAQSSTSDDEVYWPAFTGLGAPYWDSNA 3 62 

ALEGSIFIAGSAIQWLRDGL+MIETS ESE A +STSDDEVYWPAFTGLGAPYWDSMA 
Sbjct: 302 ALEGSIFIAGSAIQWIjRDGLKMIETSPESEQFALASTSDDEVYWPAFTGLGAPYWDSNA 361 

Query: 3 63 RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTWQVDSGIDIQQLRVDGGAAMHNLLMQ 422 

RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTMQVDSGIDIQQLRVDGGAAMNN+LMQ 
Sbjct: 362 RGSWGLTRGTSKEDFVKATLQSIAYQWDVIETMQVDSGIDIQQLRVDGGAAMNNMLMQ 421 

Query: 423 FQADILGIDIARAHttETTALGAftFIAGLSVGYWES^ELKELNATGQLFQATMNESRKE 482 

FQADILGIDIARAKNLETTALGAAFLAGL+VGYWE MD LKEIjNATGQLF+A+MNESRKE 
Sbjct: 422 FQADIMIDIARAKNLETTALGAAFIAGIAVGYVreDMDALKEimTGQLFKASI^SRK^ 481 

Query: 483 KLYKGWRKAVKATQVFAQED 502 

KLYKGW+4AVKATQVF QE+ 
Sbjct: 482 KLYKGWKRAVKATQVFTQEE 501 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 935 

A DNA sequence (GBSx0992) was identified in S.agalactiae <SEQ ID 2845> which encodes the amino 
acid sequence <SEQ ID 2846>. Analysis of this protein sequence reveals the following: 

■J- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3146 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 936 

A DNA sequence (GBSx0993) was identified in S.agalactiae <SEQ ID 2847> which encodes the amino 
acid sequence <SEQ ID 2848>. This protein is predicted to be alpha-glycerophosphate oxidase (glpD). 
Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N- terminal signal sequence 

Likelihood = -1.81 Transmembrane 20 - 36 ( 20 - 36) 



Pinal Results 

bacterial membrane Certainty=0 . 1723 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC34740 GB:U94770 alpha-glycerophosphate oxidase [Streptococcus pneumoniae] 
Identities = 464/608 (76%), Positives = 539/608 (88%) 

Query: 1 MEFSRETKRLALQRMQDRTLDLLIIGGGITGAGVALQAAASGLDTGLIEMQDFAEGTSSR 60 

MEFS++TR L++++MQ+RTLDLLIIGGGITGAGVALQAAASGL+TGLIEMQDFAEGTSSR 
Sbjct: 1 MEFSKKTRELS I KKMQERTLDLLI IGGGITGAGVALQAAASGLETGLIEMQDFAEGTSSR 60 

Query: 61 STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDEPGSTFSMFRL 120 

STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDE G+TFS+FRL 
Sbjct: 61 STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDEDGATFSLFRL 120 

Query: 121 KVAMDLYDLLAGVTOTPAANKVLSAEDVLKREPDLQKEG 180 

KVAMDLYDLLAGV+NTP ANKVLS + VL+R+P+L+KEGL+GGGVYLDFRNNDARLVIEN 
Sbjct: 121 KVAMDLYDLI^GVSNTPTANKVIiSKDQVLERQPNLKKEGLVGGGvYLDFRNNDARLVIEN 180 

Query: 181 IKRANPXIGAYIASHVKAEDFLFDDNNQIIGVRARDLLTDQVIDIKARLVIOTTGPWSDTV 240 

IKRAN+DGA IA+HVKAE FLFD++ +1 GV ARDLLTDQV +IKARLVINTTGPWSD V 
Sbjct: 181 IKRANQDGALIANHVT^AEGFLFDESGKITGWARDLLTDQVFEIKARLVINTTGPWSDKV 240 

Query: 241 RNFSNEGKQIHQLRPTKGVHLVVDRQKIJSriSQPVYVDTGLNDGRMIFVLPREDKTYFGTT 300 

RN SN+G Q Q+RPTKGVHLWD K+ +SQPVY DTGL DGRM+FVLPRE+KTYFGTT 
Sbjct: 241 RNLSNKGTQFSQMRPTKGVHLWDSSKIKVSQPVYFDTGLGDGRMVFVLPRENKTYFGTT 300 

Query: 301 DTDYHGDLEHPTVTKEDVDYLLNIVNKRFPEAELTIDDIESSWAGLRPLLSGNSASDYNG 360 

DTDY GDLEHP VT+EDVDYBL IVN RFPE+ +TIDD I ESSWAGLRPL+ +GNSASD YNG 
Sbjct: 301 DTDYTGDLEHPKVTQEDVDYLLGIVNNRFPESKITIDDIESSWAGLRPLIAGNSASDYNG 360 

Query: 361 GNSGKLSDESFEELIDSVKDYIAHKNHREDVEKAISHVESSTSEKELDPSAVSRGSSFER 420 

GN+G +SDESF+ LI +V+ Y++ 4- REDVE A+S +ESSTSEK LDPSAVSRGSS +R 
Sbjct: 361 GNNGTISDESFDNLIATVESYLSKEKTREDVESAVSKLESSTSEKHLDPSAVSRGSSLDR 420 

Query: 421 DDNGLLTLAGGKITDYRKMAEGAMETIINILDKEYireKFKLINSKTYPVSGGEINPSim) 480 

DDNGLLTLAGGKITDYRKMAEGAME +++IL E++R FKLINSKTYPVSGGE+NP+NVD 
Sbjct: 421 DDNGLLTIAGGKITDYRKMAEQAMERVVDILKAEFDRSFKLINSKTYPVSGGELNPANVD 480 

Query: 481 SEIEAYAQLGTLSGLSIEDARYIANLYGSNAPKLFALTRQITEAEGLSLVETLSLHYAMD 540 

SEIEA+AQLG GL ++A Y+ANLYGSNAPK+FAL + +A GLSL +TLSLHYAM 
Sbjct: 481 SEIEAFAQLGVSRGLDSKEAHYLAHLYGSNAPKVFALAHSLEQAPGLSLADTLSLHYAMR 540 

Query: 541 YEMALSPTDFFLRR1XIHMLFMRDNLDSLIQPVIDEMAKHYQWSDQDKTFYEEELHETLKD 600 

E+ALSP DF LRRTNHMLFMRD+LDS+++PV+DEM + Y W++++K Y ++ L + 
Sbjct: 541 NElMSPVDFLLRRMfflMrjFMRDSLDSIVEPVLD^^ 600 

Query: 601 NDLAALKD 608 

NDIA LK+ 
Sbjct: 601 NDLAELKN 608 



There is also homology to SEQ ID 128. 
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SEQ ID 2848 (GBS93) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 7; MW 70.6kDa). 

GBS93-His was purified as shown in Figure 192, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 937 

A DNA sequence (GBSx0994) was identified in S.agalactiae <SEQ ID 2849> which encodes the amino 
acid sequence <SEQ ID 2850>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 09S5 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 938 

A DNA sequence (GBSx0995) was identified in S.agalactiae <SEQ ID 2851> which encodes the amino 
acid sequence <SEQ ID 2852>. This protein is predicted to be glycerol uptake facilitator protein (glpF). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 220 - 236 ( 216 - 236) 

INTEGRAL Likelihood = -6.48 Transmembrane 139 - 155 ( 136 - 158) 

INTEGRAL Likelihood = -3.88 Transmembrane B7 - 103 ( 83 - 107) 

INTEGRAL Likelihood = -3.03 Transmembrane 164 - 180 ( 162 - 183) 



- Certainty=0. 3 972 (Affirmative) ■ 



bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8689> which encodes amino acid sequence <SEQ ID 8690> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
SRCFLG: 0 

McG: Length of OR: 21 

Peak Value of OR: 2.51 
Net Charge of CR: -2 
McG: Discrim Score: 4.43 
GvH: Signal Score (-7.5): -0.139999 

Possible site: 50 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 51 
ALOM program count: 4 value: -7.43 threshold: 0.0 

Likelihood = -7.43 Transmembrane 215 - 231 ( 211 - 231) 
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INTEGRAL Likelihood = -6.48 Transmembrane 134 - 150 ( 131 - 153) 

INTEGRAL Likelihood = -3.88 Transmembrane 82 - 98 ( 78 - 102) 

INTEGRAL Likelihood = -3.03 Transmembrane 159 - 175 ( 157 - 178) 
PERIPHERAL Likelihood =4.98 65 
modified ALOM score: 1.99 
icml HYPID: 7 CFP: 0.397 



10 Final Results 

bacterial membrane Certainty=0 .3972 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 



Query: 7 DIFGEFLGTALLVLLGNGWAGWLPKTKNHMSGWrVITFGWGLAVAIAALVSGNISPAH 66 
20 ++FGEFLGT +L+LLGNGWAGVVLPKTK+++SGWIVIT G+AVA+A VSG +SPAH 

Sbjct: 4 ELFGEFLGTL1LILLGNGWAGWLPKTKSNSSGWIVITMV-GIAVAVAVFVSGKLSPAH 62 

Query: 67 IiNPAVSLAFAIKGDLAWGTAILYMIAQI IGAMLGSLLVYLQFRPHYEAAENRADILGTFA 126 
LNPAV++ A+KG L W + + Y++AQ GAMLG +LV+LQF+PHYEA EN +IL TF+ 
25 Sbjct: 63 LNPAVTIGVALKGGLPWAS-ULPYILAQFAGA14LGQILVWLQFKPHYEAEENAGNI1ATFS 122 

Query: 127 TGPALKDNFSNFLSEVLGTLVLVLTIFAIGKYNMPPGVGTMSVGMLWGIGLSLGGTTGY 186 

TGPA+KD SN +SE+LGT VLVLTIFA+G Y+ G+GT +VG L+VGIGLSLGGTTGY 
Sbjct: 123 TGPAIKDTOSNLISEILGTFVLVLTIFALGLYDFQAGIGTFAVGTLIVGIGLSLGGTTGY 182 

30 

Query: 187 AINPARDFGPRLLHALLPMKNKGDSDWTYSWIPIVGPMVGAILAALIFAM 236 

A+NPARD GPR+4H++LP+ NKGD DW+Y+WIP+VGP+4GA LA L+F++ 
Sbjct: 183 ALNPARDLGPRIMHS ILP I PNKGDGDWS YAWI PWGPVIGAALAVLVFSL 232 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 2853> which encodes the amino acid 
sequence <SEQ ID 2854>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.13 Transmembrane 213 - 229 ( 209 - 232) 

40 INTEGRAL Likelihood = -5.52 Transmembrane 137 - 153 ( 132 - 157) 

INTEGRAL Likelihood = -4.35 Transmembrane 159 - 175 ( 155 - 178) 

INTEGRAL Likelihood = -1.17 Transmembrane 85 - 101 ( 85 - 101) 

Final Results 

45 bacterial membrane Certainty=0. 4652 (Affirmative) < suco 

bacterial outside Certainty-0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

50 >GP:AAA91618 GB:U12567 glycerol uptake facilitator [Streptococcus pneumoniae] 

Identities = 159/230 (69%) , Positives = 196/230 (85%) , Gaps = 1/230 (0%) 

Query: 2 DIFGEFLGTALLVLLGNGWAGWLPKTKTHASGWIVIATGWGIAVAVAVFISGKVAPAH 61 
++FGEFLGT +L+LLGNGWAGWLPKTK+++SGWIVI T GIAVAVAVF+SGK++PAH 
55 Sbjct: 4 ELFGEFLGTLILILLGNGVVAGVVLPKTKSNSSGWIVI-TIWGIAVAVAVFVSGKLSPAH 62 

Query: 62 LNPAVSLAFAMSGTIAWSTAIAYSLAQLLGAMVGSTLVFLQFRPHYLAAESQADILGTFA 121 

LNPAV++ A+ G + W++ + Y LAQ GAM+G LV+LQF+PHY A E+ +IL TF+ 1 
Sbjct: 63 I^PAVTIGVALKGGLPWAS\njPYILAOFAGAMLGQILVWLQFKPtIYEAEENAGNILATFS 122 

60 

Query: 122 TGPAIRDTSSNLLSEIFGTFVLMLGILAFGLYDMPAGLGTLCTGTLVIGIGLSLGGTTGY 181 

TGPAI+DT SNL+SEI GTFVL+L I A GLYD AG+GT VGTL++GIGLSLGGTTGY 
Sbjct: 123 TGPAIKDTVSNLISEILGTFVLVLTIFALGLYDFCAGIGTFAVGTLIVGIGIiSLGGTTGY 182 
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Query: 182 AINPARDLGPRLVHAILPI^KGDSDWSYAWIPVVGPIIGAVLa.VLI)FQV 231 

A+NPARDLGPR+ +H+ ILP+ NKGD DWSYAWIPWGP+IGA LAVL+F + 
Sbjct: 183 ALNPARDLGPRIMHSILPIPNKGDGDWSYAWIPWGPVIGAALAVLVFSL 232 



5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/232 (72%) , Positives = 202/232 (86%) 

Query: 6 ^IFGEFLGTALLVX.LGNGWAGVVLPKTICNHNSC-WIVITFGWGLAVAIAALVSGNISPA 65 
MDIFGEFLGTALLVLLGNGWAGWLPKTK H SGWIVI GWG+AVA+A +SG ++PA 
10 Sbjct: 1 MDIFGEFLGTALLVLLGNGWAGWLPKTKTHASGKIVIATGWGIAVAVAVFISGKVAPA 60 





66 


HI^PAVSIAFAIKGDIAWGTAILYMIAQIIGAMLGSLLVYLQFRPHyEAAEnRADILGTF 


125 






HLNPAVSLAFA+ G +AW TAI Y +AQ++GAM+GS LV+LQFRPHY AAE-f +ADILGTF 




Sbjct: 


61 


HLWPAVSLaFAMSGTIAWSTAIAYSLAQLLGAMVGSTLVFLQFRPHYLAAESQADILGTF 


120 




126 


ATGPALKDNFSNFLSEVLGTLVLVLTIFAIGKYNMPPGVGTMSVGMLWGIGLSLGGTTG 


185 






ATGPA++D SN LSE+ GT VL+L I A G Y+MP G+GT+ VG LV+GIGLSLGGTTG 




Sbjct: 


121 


ATGPAIRDTSSNLLSEIFGTFVLMLGILAFGLYDKPAGLGTliCVGTLVIGIGLSLGGTTG 


180 




186 


YAINPARDFGPRLLHALLPMKNKGDSDWTYSWIPIVGPWGAILAALIFAMM 237 








YAINPARD GPRL+HA+LP+ NKGDSDW+Y+WIP+VGP++GA+LA L+F +M 




Sbjct: 




YAINPARDLGPRLVHAILPLNNKGDSDWSYAWIFWGPIIGAVIAVLLFQVM 232 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 939 

A DNA sequence (GBSx0996) was identified in S.agalactiae <SEQ ID 2855> which encodes the amino 
acid sequence <SEQ ID 2856>. This protein is predicted to be NADH oxidase. Analysis of this protein 
sequence reveals the following: 

30 Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.87 Transmembrane 152 - 168 ( 152 - 168) 



Final Results 

35 bacterial membrane --- Certainty=0. 2147 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9523> which encodes amino acid sequence <SEQ ID 9524> 
40 was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 10 IVILGASFAGMTCaQKLRQI^PNWDIVLIDKEIKPDWPNGLNWYYRHEISGLNQAMWQT 69 

+V++G + AG + + + +P ++ + ++ + ++ G+ Y + + 
Sbjct: 3 VWVGCTHAGTSAVKSIIA!fflPEAEVTVYER!TOHISFLSCGIALWGGWKNAADLFYSH 62 

Query: 70 EEEQRLQNIRCLFGLKVEKINKEDR ELMLSDGSSVYYDQLICAMGSQAESTYIDG 124 

EE VE+IW +D+ L +V YD+L+ GS I G 

Sbjct: 63 PEEIASLGATVKMEHNVEEINVDDK'IVTAKKLQTGATEWSYDKLVMTTGSWPIIPPIPG 122 

Query: 125 ADAQGVLTTKrYATSQKAKQVLDKSHfCVAWGAGIIGLDIAYSIiHESGKAVTLLEAQERP 184 

DA+ +L K Y+ + + + +V TO G IG+++ + ESGK VTL++ +R 
Sbjct: 123 IDAENILLCKireSQANVIIEKAKDAKRVVVVGGGYIGIELVEAFVESGKQVTLVDGLDRI 182 



Query: 185 DFRHTDPDMSLPLLDAMAESKlHFFQNQKVEKITXTOEEaCI^TLTGDTFaVmVlLAV 244 
++ D + L + + ++ + V++ + K+ F D VI+ V 
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Sbjct: 


183 


Query: 


245 


Sbjct: 


243 


Query: 


305 


Sbjct: 


303 


Query: 


356 


Sbjct: 


363 


Query: 


416 


Sbjct: 


422 



)• T ++G QL+SK + +AN L A+ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2857> which encodes the amino acid 
sequence <SEQ ID 2858>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

.gnal seq 

155 - 171 ( 155 - 173) 



Final Results 

bacterial membrane Certainty=0 . 2338 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 54-56 

The protein has homology with the following sequences in the databases: 

>GP:CAA44611 GB;X62755 NADH peroxidase [Enterococcus faecalis] 
Identities = 111/428 (25%) , Positives = 202/428 (46%) , Gaps = 24/428 (5%) 

Query: 10 VIGASFAGIAF^KYKDMPDSQIILIDKESCP3SYIPNGINQLFRGD1QDLSDAMWGRAC 69 

V+G+S G V++ +L+PD++I +K +++ G+ G ++D++ R 
Sbjct: 5 VLGSSHGGYEAVEELLNLHPDAEIQWYEKGDFISFLSCGMQLYLEGKVKDVNSV RYM 61 

Query: 70 LAAQIESN- -HRFIQAEVLAIEAPSNTLLLKDS -QGRVFEEGYETLVCAMGASPQSHYIE 126 

++ES + F E+ AI+ + + +KD G E Y+ L+ + GA P I 
Sbjct: 62 TGEKMESRGVWFSNTEITAIQPKEHQVTVKDLVSGEERVENYDKL IIS PGAVPFELD I P 121 

Query: 127 TSQTNKVLVTKYYEESQASLKLIEASQE VLVIGAGLIGLDLAYSLSLQGKRVKLI 181 

+ + 4 + Q ++KL + + + V+VIG+G IG++ A + + GK+V +1 

Sbjct: 122 GKDLDNIYLMR---GRQWAIKLKQKTVDPEVNNVV/IGSGYIGIEAAEAFAKAGKKVWI 178 

Query: 182 EffiMRPDFYQTDAELIAPVMAEMSTHHVTFINNKRVTAIHEIEGKVVAHTEQGDTFQGDL 241 

+ +RP D E + EM +++T + V +E +G+V + + DL 

Sbjct: 179 DILDRPLGVYLDKEFTDVLTEEMEA^ITIATGETVER-YEGDGRVQKVVTDKNAYDADL 237 

Query: 242 AILAINFRPNTHLLQGQVACALDKTILVNENLQ'TSQANIYAIGDMVSLHFGILGMDYYTP 301 

++A+ RPNT L+G + + I +E ++TS+ +++A+GD + + + 
Sbjct: 238 WVAVGVRPNTAWLKGTriELHPNGLIKTDEYMRTSEPDVFAVGDATLIKYNPADTEVNIA 297 

Query: 302 LINQAMKTGQALALHLAGYPIPPLQTVK-VLGSSHFDYYRASVGVTE EEAELY 353 

Ii A K G+ +L P+ P V+ G + FDY AS G+ E +E + 

Sbjct: 298 IATNARKQGRFAVKNLE-EPWPFPGVQGSSGLAVFDYKFASTGINEVMAQKLGKETKAV 356 

Query: 354 ^TCSYLYQNGDSKNLFWLKLIARK^DC-ILIGAQLLSIOWALVIANQLGQAI^KVTDAD 413 

YL K W KL+ ++GAQL+SK + M + A+ K+T D 

Sbjct: 357 TWEDYLMDFNPDKQKAWFKIjVYDPETrQILGAQLMSKADLTANINAISLAIQAKMTIED 416 

Query: 414 LAFQDFLF 421 
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LA+ DF F 
Sbjct: 417 LAYADFFF 424 

An alignment of the GAS and GBS proteins is shown below. 

5 Identities = 192/440 (43%) , Positives = 276/440 (62%) , Gaps = 7/440 (1%) 

Query: 8 KVIVILG^FAGMTC^KLRQLNPBTO^ 67 

K I ++GASFAG+ K + LNP+ I+LIDKE P+Y+PNG+N +R +1 L+ AMW 
Sbjct: 6 KTIHVIGASFAGIAFVDKYKDLNPDSQIILIDKESCPNYIPNGINQLFRGDIQDLSDAMW 65 

10 

Query: 68 -QTEEEQRLQNIRCLFGLKVEKINKEDRELM1SDGSSVY YDQLICAMGSQAESTYI 122 

+ ++++ +V I L+L D Y+ L+CAMG+ +S YI 

Sbjct: 66 GRACLAAQIESNHRFIQAEVLAIEAPSNTLLLKDSQGRVFEEGYETLVCAMGASPQSHYI 125 

15 Query: 123 DGADAQGVLTTKTYATSQNAI^QVLDKSHICVAWGAGIIGLDIAYSLHESGKAVTLLFAQE 182 

+ + VL TK Y SQ + ++++ S +V V+GAG+IGLD+AYSL GK V L+EA E 
Sbjct: 126 ETSQTNKVLVTKYYEESQASLKLIEASQEVLVIGAGLIGLDIAySLSLQGKRVKLIEAAE 185 

Query: 183 RPDFRHTDPDMSLPLLDAMAESKLHFFQNQKT\7EKITVTREEKLCLRTLTGDTFTVDAVIL 242 
20 RPDF TD ++ P++ M+ + F N++V I E K+ T GDTF D IL 

Sbjct: 186 RPDFYQTDAELIAPVMAEMSTHHVTFINNKRVTAIHEI - EGKWAHTEQGDTFQGDLAIL 244 

Query: 243 AVNFRPDSRLLTGLVDLSVDNSWVNDYFQTSDPNIYAIGDLIWSYFKGLNSAYYMPLIN 302 
A+NFRP++ LL G V ++D ++4VN+ QTS NIYAIGD++ +F L YY PLIN 
25 Sbjct: 245 AINFRPNTHLLQGQVACALDKTILVNENLQTSQANIYAIGDMVSLHFGILGMDYYTPLIN 304 

Query: 303 QAIRSAQMLAYHLSGHAVPKLKITRATGSKHFGYYRANIGLTELEAGFYEDTVSVTYFPK 362 

QA+++ Q LA HL+G+ +P L+ + GS HF YYRA++G+TE EA Y DT S Y 
Sbjct: 305 QAMKTGQALALHLAGYPIPPLQTVKVLGSSHFDYYRASVGVTEEEAELYMDTCSYLYQNG 364 

30 

Query: 363 EQYDL-RIKLIAWQKTCHLrGAQLISKENCIATANQLVQAISCDMTDFDLAFQDFIYTAR 421 

+ +L +KLIA + G L+GAQIi+SK N h ANQL OA++ +TD DLAFQDF++ 
Sbjct: 3 65 DSmLFWLKlIARKTDGILIGAQLLSKTNALVIAKQLGQAIALKVTDADLAFQDFLFLQG 424 

35 Query: 422 ESEMAYMLHQAAINLYEKRI 441 

S++AY LH+A + L+EKR+ 
Sbjct: 425 HSDLAYHLHEACLKLFEKRL 444 

There is also homology to SEQ IDs 1820, 1876, 4666. 
40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 940 

A DNA sequence (GBSx0998) was identified in S.agalactiae <SEQ ID 2859> which encodes the amino 
acid sequence <SEQ ID 2860>. Analysis of this protein sequence reveals the following: 

45 Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2980 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

55 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 941 

A DNA sequence (GBSx0999) was identified in S.agalactiae <SEQ ID 2861> which encodes the amino 
acid sequence <SEQ ID 2862>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
5 »> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3 54 8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 942 

A DNA sequence (GBSxlOOO) was identified in S.agalactiae <SEQ ID 2863> which encodes the amino 
acid sequence <SEQ ID 2864>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1685 (Affirmative) < suco 

bacterial membrane — Certainty-0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9525> which encodes amino acid sequence <SEQ ID 9526> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 2865> which encodes the amino acid 
sequence <SEQ ID 2866>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 



35 Final Results 

bacterial cytoplasm Certainty=0. 3 125 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 179/476 (37%) , Positives = 279/476 (58%) , Gaps = 5/476 (1%) 

Query: 1 MRIEALMEKERRVQYRLLSFLRGSPQAIALKLALLETGLSRATFLIO'INNLNSYFEQEKV SO 
M+IE LM+KERR QYRLL li -h + + LK + + LS+ T LKYI+NtN ++ + 
45 Sbjct: 21 MKIEDLMDKERJ^QYRLLVTLYHAKETLRLKDLffiLSNLSKVTLLKYIDNLNHXCREQGL 80 



Query: 61 NCRIVYYKDKLFLEEDYNLSNQE^KALMKDSIKYTILISIjFNQRQFTIVGIjSQELMVSE 120 

C+++ KD L L+E+ ++++ L+K+S+ Y IL ++ F I LS ELMVSE 

Sbjct: 81 ACQLLLEKDSLSLKENGQFHWEDLVALLLKESVAYQILTYMYCHEHFNITNLSVELMVSE 140 

50 

Query: 121 ATLNRHLAHIiNELIAEFDIAISC<3KQIGDELQWP.YFYYELFKQLWSYDKCQNMIK10jDLD 180 
ATLNR LAHLN+LL+EFD+A+SQG+Q+G ELQKRYFY+ELF+ + ++ +LD 
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Sbjct: 141 ATLNRQLflHLNQLLSEFDIALSCC-RQLGSELQWRYFlTELFRHTLTRQGlDALVNQriDAS 200 

Query: 181 SLILL1ERLAQHTLTREAHQNLGLWFSICHHRLLAMEKISDNLKPIVKHYQCNAFYKRLD 240 

I> LIERL +L+ EA + L +W +1 R+ 4 +D+ N F+KRL+ 

Sbjct: 201 HLATLIERLIGQSLSAEALEQLLIWIAISQARMSFQKSY1TOHFLRDSDFMTSNIFFKRLE 260 

Query: 241 AALVLYMSRFALEYREGEVIATFAFLHSQNILPINTMEYIMGFGGPIIDCVTETIIYFKK 300 

+ L+ Y+ R+ALE+ E + F FLH+ +LPI +M+Y +GFGGPI D ++E + KK 
Sbjct: 261 SMLLHYLRRYALEFDAFEAICSLFVFLHAYPIJl.PIASMKYSLGFGGPIADHISEALWLLKK 320 

Query: 301 ESILADETSDQVIYQLGQLYSHYYFFKGHILVEQPDLEQTYRLIDHNMRDKLHHISKKII 360 

++ +T ++4IY LG +S YFFKG IL + + + Y+L+ + R h I ++ 
Sbjct: 321 AHVI IHQTKEEI IYGLGI FFSKAYFFKGAI LSQPTNSQYLYQLVGEDKRALLRVI INHLV 380 

Query: 361 ANVNRIRPLTEDGCSLLTLHLLELLIFSXMSQKMPFRIGLDMTGNAVEQSLLEYRIRQHF 420 

+++ D L+ +L LLIFS P +GL + N VE ++ E IR+H 

Sbjct: 381 LQMDQ ETDFSQQLSDDIIALLIFSIERHHEPLLVG1ALGQNKOTAAIAELAIRRHL 436 

Query: 421 SGNNS I QVEPYDEGKGFD - MVIYQSHSRPYKAKLTYCLNKGASERELQE IDSLI YD 475 

Q+ PYD K +D ++ YQ+ P + Y L + +S EL +++ + D 
Sbjct: 437 GHRRDFQLMPYDHQICVYDCLITYQTVCLPRQDLPYYRLKQYSSPYELTALEAFLKD 492 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 943 

A DNA sequence (GBSxlOOl) was identified in S.agalactiae <SEQ ID 2867> which encodes the amino 
acid sequence <SEQ ID 2868>. This protein is predicted to be transketolase (fktA-1). Analysis of this 
protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2034 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9527> which encodes amino acid sequence <SEQ ID 9528> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06071 GB:AP001515 transketolase [Bacillus halodurans] 
Identities = 403/661 (60%), Positives = 520/661 (77%), Gaps = 8/661 (1%) 

Query: 6 IDQLAVNTWTLSIDAIQAANSGHPGBPMGARPMAYV^ 65 

++QLAVNT+RTLSID+++ ANSGHPG+PMGAAPMA+ LW KF+N NP + 4-W NRDRFV 
Sbjct: 5 VEQLAVOTIRTLSIDSVEKMSGHPGMPMGAftP^FCU^ 63 

Query: 66 LSAGHGSALLYSLLHLAGYDLSIDDLKQFRQWGSKTPGHPEVNHTDGVEATTGPLGQGIA 125 

LSAGHGS LLYSLLHL GYDLS+++L+ FRQWGSKTPGHPE HT GVEATTGPLGQG+A 
Sbjct: 64 LSAGHGSMLLYSLLHLTGYDLSLEELQNFRQKGSKTPGHPEYGHTPGVEATTGPLGQGVA 123 

Query: 126 NAVG^W^AHLAAKFNKPGFDLvDHYTYTL^^ 185 

AVGMAMAE HIAA +N+ G+++VDHYTYT+ GDG LMEGVS EARSLAGHLKLG+++LL 
Sbjct: 124 I^VGMAMAERHIAATYNFJ3GYNIVDHYTYTICGDGDLMEGVSAEARS]^GHLKLGRMILL 183 

Query: 186 YDSNDISLDGPTSQSFTEDVKGRFESYGWQHILVKDGNDLEAIAAAIEAAKAETDKPTII 245 

YDSNDISLDG SF+E V+ RF++YGW + V+DGN+L+ IA AIE AKA+ ++P++I 
Sbjct: 184 YDSM)ISLDGDLHHSFSESVEDRFKAYGWHVTOVEDGITOLDEIAKaiEEAKAD-ERPSLI 242 

Query: 246 EVKTIIGFGAEKQ3TSSV-HGAPLG&EGITFAKKAYVWEYP-DFTVPAEVADRFASDLQA 303 
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Sb:ct: 




Query: 




Sb^ct: 




Query: 




Sb D ct: 




Query: 




Sb D ct: 


422 


Query: 


481 


Sbj Gt: 




Query: 


541 


Sbjct: 


542 


Query: 


601 


Sbjct: 


602 



EVKT IGFG+ +G SV HGAPLGA+ + K+AY W Y +F +P I 



+GA+ EE+WN+LFA+Y+ YPEIA++++ AG + ++++G SVA+R SS + 



+P L+GGSADL++SN T4+ E +F +Y+GRN+WFGVREFAM AAMNG+A 



LHGG +V+G TFFVFS+YL PA+R+AAL LP +YV THDSIAVGEDGPTHEP+EQLAS+ 



R+MP L+VIRPADGNE+ AAW+ A+ D+PT LVL+RQNLP LEG + A +GV+KGAY 



PS VT RLAIE GSS GW KYVG G + ID +GASAPG RI EE+GFTV++ V+ K L 



There is also homology to SEQ ID 520. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 944 

A DNA sequence (GBSxl002) was identified in S.agaJactiae <SEQ ID 2869> which encodes the amino 
acid sequence <SEQ ID 2870>. Analysis of this protein sequence reveals the following: 

Possible site: 3 9 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4477 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9529> which encodes amino acid sequence <SEQ ID 9530> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogems <SEQ ID 2871> which encodes the amino acid 
sequence <SEQ ID 2872>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4581 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 27/79 (34%) , Positives = 45/79 (56%) 
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-1035- 
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Query: 3 MKKECRDFYRQIQHTYNDI SVREDAVL S S I LL S ASNGL I KTSDVPRVAYELTQQLENNE I 62 

M+K+ + Y 1+ Y+ RE+ hS +IJQ+ASN LIK S+ VAY+L Q ++N + 
Sbjct: 1 MEKKRQRLYDVIRQAYDYPEMEOTALSQLLimSNRL-KHSMPLLVAYQLNQDVDNYLL 60 

5 

Query: 63 EKS FESIATVKELKKSAKK 81 

+ ++ K+S +K 

Sbjct: 61 DNDILLPKSLCRFKQSLEK 79 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 945 

A DNA sequence (GBSxl003) was identified in S.agalactiae <SEQ ID 2873> which encodes the amino 
acid sequence <SEQ ID 2874>. This protein is predicted to be ABC transporter, ATP-binding protein. 
15 Analysis of this protein sequence reveals the following: 
Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm --- Certainty=0. 2610 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:CAB49925 GB:AJ248286 ABC transporter, ATP-binding protein 

[Pyrococcus abyssi] 
Identities = 96/243 (39%), Positives = 164/243 (66%), Gaps = 2/243 (0%) 

Query: 1 MIKFEHVSKVYGEKEALSDLTLSVKDGEIFGLIGHNGAGKTTTISILTSIIDATYGQVYI 60 
30 MI E++ K +G KE L ++ +VKDGEI+GL+G NG+GK+TT+ IL+ II G+V + 

Sbjct: 1 MIIVENLRKRFGGKEVLKGISFTVKDGEIYGLLGPNGSGKSTTMRILSGIITDFEGKVIV 60 



DDLLLTEHRDQIKKKIGYVPDSPDIFLiNLTAEEYWYFIjAKIYDVAPEDIEARITKLVDIF 120 

+ + + Q+K+ +GYVP4-+P ++ +LT E++ F+ + + + +E R+ KLV+ F 
GGVEVAKDPLQVKRIVGYVPETPALYESLTPAEFFSFVGGVRGIPKDILEERVRKLVEAF 12 0 



Query: 181 NGKTVIFSTH^/IAVAEQLCDRIGILKQGKLIF\ r GSLGELKMKYPDKDLETIYLELAGRQA 240 

GK+++FSTHVLA+AE +CDR+GI+ QG++I G++ ELK ++ LE ++L+L QA 
Sbjct: 181 EGKSIVFSTHVLALAELICDRVGI I YQGRI IAEGTVEELKEISKEERLEDVFLKLT- -OA 238 

Query: 241 SRE 243 
E 

Sbjct: 239 KEE 241 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2875> which encodes the amino acid 
50 sequence <SEQ ID 2876>. Analysis of this protein sequence reveals the following: 

a N-terminal signal sequence 

• Final Results 

bacterial cytoplasm — Certainty=0. 2723 (Affirmative) < suco 

bacterial membrane -— Certaiaty=0. 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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-1036- 



Identities = 182/244 (74%) , Positives = 215/244 (87%) 



Query: 1 MIKFEHVSKVYGEKEALSDLTLSVKEGEIFGLIGHNC-AGKTTTISILTSIIDATYGQVYI 60 

MI+F+HVSK+YG+KEALSDL +++ DGE I FGL I GHNGAGKTTT I S I LTS 1 1 +A+YG+V++ 
Sbjct: 1 MIEFKHVSKLYGDKEALSDLNVTINDGE I FGL IGHNGAGKTTTIS ILTSI IEASYGEVFV 60 

Query: 61 DDLLLTEHRDQIKKKIGYVPDSPDIFENLTAEEYWYFLAKIYDVAPEDIEARITKLVDIF 120 

D LLTE+R+ IKK+I YVPDSPDIFLNLT EYW FLAKIY V+ ED E R+ +L +F 
Sbjct: 61 DGQLLTENREAIKKQIAYVPDSPDIFL1-IL7PNEYWQFLAKIYGVSDEDREERLAQLTTLF 120 

Query: 121 ELEEQRYNPIESFSHGMRQKVIVIGALLPNPDIWILDEPLTGLDPQASFDLKEMMKEHAK 180 

EL+E+ I+SFSHGMRQKVIVIGAL+ NP+IWILDEPLTGLDPQASFDLKEMMK HA 
Sbjct: 121 ELKEEVNQTIDSFSHGMRQKVIVIGALVSNPMIWILDEPLTGLDPQASFDLKEMMKAHAA 180 

Query: 181 NGKWIFSTHVLAVAEQLCDRIGILKQGKLrFVGSLGELKMKYPDKDLETIYLEIAGRQA 240 

+G TV+FSTHVL+VAEQLCDRIGILK+GKLIFVG++ ELK +PDKDLE+ IYLELAGR+A 
Sbjct: 181 SGHTVLFSTHVLSVAEQLCDRIGILKKGKLIFVGTIDELKEHHPDKDLESIYLELAGRKA 240 

Query: 241 SREG 244 
EG 

Sbjct: 241 QEEG 244 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



25 Example 946 

A DNA sequence (GBSxl004) was identified in S.agalactiae <SEQ ID 2877> which encodes the amino 
acid sequence <SEQ ID 2878>. Analysis of this protein sequence reveals the following: 

Possible site: 55 



Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood =-13.43 


Transmembrane 




520 


495 






Likelihood =-12.58 


Transmembrane 


427 


443 


400 


449 


INTEGRAL 


Likelihood =-10.99 


Transmembrane 


151 


167 


144 




INTEGRAL 


Likelihood = -8.44 




194 


210 


189 


214 


INTEGRAL 


Likelihood = -7.96 




48 


64 


46 


68 


INTEGRAL 


Likelihood = -7.32 


Transmembrane 


350 


366 


348 


378 


INTEGRAL 


Likelihood = -6.69 


Transmembrane 


475 


491 


474 


501 


INTEGRAL 


Likelihood = -6.00 


Transmembrane 




335 


318 


337 


INTEGRAL 


Likelihood = -5.73 


Transmembrane 


252 


268 


244 


271 


INTEGRAL 


Likelihood = -4.78 


Transmembrane 


125 


- 141 


121 


148 


INTEGRAL 


Likelihood = -4.51 


Transmembrane 


76 


92 


71 


98 


INTEGRAL 


Likelihood = -3.56 


Transmembrane 


406 


422 


400 


426 



Final Results 

bacterial membrane --- Certainty=0. 6371 (Affirmative) < suco 

45 bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2879> which encodes the amino acid 
sequence <SEQ ID 2880>. Analysis of this protein sequence reveals the following: 



37 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



j to have no N-terminal signal sequence 



Likelihood 
Likelihood 
Likelihood =-10 
Likelihood = -8 
Likelihood = -8 
Likelihood = -8 
Likelihood - -7 
Likelihood = -6 



524 - 540 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 139 - 155 
Transmembrane 261 - 277 
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Likelihood = - 

Likelihood = - 

INTEGRAL Likelihood = ■ 

INTEGRAL Likelihood = - 



Transmembrane 446 - 462 ( 444 - 464) 

Transmembrane 369 - 385 ( 367 - 387) 

Transmembrane 87 - 103 ( 87 - 104) 

Transmembrane 334 - 350 ( 334 - 350) 



■ Final Results 

bacterial membrane • 
bacterial outside - 



-- Certainty=0. 6731 (Affirmative) • 
- Certainty=0. 0000 (Not Clear) ■ 



bacterial cytoplasm Certair.ty=0 . 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9173> which encodes the amino acid sequence 
<SEQ ID 9174>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 



Possible site: 51 
Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = 
Likelihood = 
Likelihood -- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
INTEGRAL Likelihood = 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm ■ 



153 ■ 
510 ■ 

Transmembrane 49 ■ 
Transmembrane 407 • 
Transmembrane 194 - 
Transmembrane 490 • 
Transmembrane 125 • 
Transmembrane 247 • 
Transmembrane 432 ■ 
355 • 



■ 169 ( 144 - 

- 526 ( 494 - 

■ 65 ( 46 - 
• 423 ( 400 - 

- 210 ( 189 - 

- 506 ( 479 - 

- 141 ( 120 - 

- 263 ( 243 - 

- 448 ( 430 - 

- 371 ( 353 - 

■ 89 ( 73 - 

- 336 ( 320 - 336; 



- Certainty=0. 673 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < 

- Certainty=0 . 0000 (Not Clear) < 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 255/542 (47%) , Positives = 378/542 (69%) , Gaps - 12/542 (2%) 

Query: 1 MNWSRIWELVKINILYSNPQTLSALRKKQEKHPKKEFSAYKSMFRNQLFQILLFSIIYVF 60 

MNWS IWEL+KINILYSNPQ+L+ L+K+QEKHPK+ F AYKSM R Q I +F +IY+F 
Sbjct: 15 MNWSTIWELIKINILYSNPQSLANLKKRQEKHPKENFKAYKSMMRQQALMIAMFLVIYLF 74 

Query: 61 LFVSLDFKEYPGYFTFYIGIFTLVSIIYSFIAMYSVFYESDDVKQYAYLPIKSEELYVAK 120 

+F+ +DF YPG F+F + +F ++S + +F ++Y++FYES+D+K Y +LP+ SEELY+AK 
Sbjct: 75 MFIGVDFSHYPGLFSFDVAMFFIMSTLTAFSSLYTIFYESNDLKLYIHLPVTSEELYIAK 134 

Query: 121 I FATFGMSVTFLMPILTLMIVAYWRI IGGPLAVLLAI INFAILFLSVTVI SLYINSLIGR 180 

I ++ GM FLMP+++L+++AYW+++G PL++L+AI+ F +L +S V+++YIN+ +G+ 
Sbjct: 135 IVSSLGMGAVFLMPLISLLLIAYWQLLGNPLSILVAIVLFLVLLVSSMVLftlYINAWVGK 194 

Query: 181 AIIRSANRKLISTILISLATFGAIVPLLFVNMTSQK--MVQGKLQDIAPIPYVRGYYDIV 238 

I+RS RKLISTI++ ++TFGA V + +N+++ K M G D IPY +G+YD+V 
Sbjct: 195 IIVRSRKRKLISTIMMFVSTFGAFVLIFAINISNNKRTMTDGVFTDYPTIPYFKGFYDW 254 

Query: 239 TAPFSMESLLNYYLPLLIILFLIGAIYKWVMPRYYQELLY GQVKQRK— VHRQIDF 292 

APFS +LLN++LPLL+IL ++ I VMP YY+E Y +VKQ K V+R 

Sbjct: 255 QAPFSTAALLNFWLPLLLILAMWGIVTKVNPTYYREAFYISl^NKTOQTKKPVNRP--- 311 

Query: 293 SKRESINKTLVKHHLSSLQNATLLTNTFLMPLLYI^FIVPILNNGKEIGRFFNENYFGI 352 

+ +S+ + L BCHHL +LQNATLLT T+LMPL+Y+ +FI P L+ G + + +YFG+ 
Sbjct: 312 HQNQSIiAQLLRKlIHLLTLQNATLLTC2TYLMPIjMYVMLFIGPSLSRGTGFFKHISPDYFGV 371 

Query: 353 AFLAGILIGSLCVMPASIVGVGISLHKSNFYFIKSLPISFSYFLKHKFVTLITLQLAVPT 412 

A L G+ +G +C PS +GVGISLEK NF FIKSLPI+ ' FL KF L+ LQL VP 
Sbjct: 372 ALLFGVSLGVMCATPTSFIGVGISLEKDNFTFIKSLPITLKKFLMDKFCLLVGLQLIVPM 431 

Query: 413 FIYFLVGFFLLIO^SILVLLSFiJ^LVFMGLIEGQFIYRRDYKHLFLNWQEVTQLFNRGLG 472 
IY + G F+L L L+ ++F LG +++G+ +YRRDY+ L L WQ++TQLF RG G 
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-1038- 

Sbjct: 432 VIYLVFGLFVLHLHPLLTIAFCLGYALSLIVQGELMYRRDYRLLDLKWQDMTQLFTRGDG 491 

Query: 473 QWLLVGSLFGMMIIGSFL-IGISIFWSMVWNTOAWIIILIIGLLILSICQYLLLKNFWK 531 

QWL +G +FG +1+ L G I +4-+ + + L++L + Q + K FWK 

Sbjct: 492 QWLTMGLIFGNLIVAGVLGFGAVIIANIIQQPLLISILLSCLILMVLGLAQLWIQKTEWK 551 

Query: 532 KL 533 
L 

Sbjct: 552 SL 553 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 947 

A DNA sequence (GBSxl005) was identified in S.agalactiae <SEQ ID 2881> which encodes the amino 
acid sequence <SEQ ID 2882>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -B.12 Transmembrane 242 - 258 ( 239 - 265) 

INTEGRAL Likelihood = -7.64 Transmembrane 430 - 446 ( 421 - 450) 

INTEGRAL Likelihood = -5.84 Transmembrane 120 - 136 ( 113 - 139) 

INTEGRAL Likelihood = -5.52 Transmembrane 212 - 228 ( 210 - 232) 

INTEGRAL Likelihood = -5.20 Transmembrane 287 - 303 ( 283 - 313) 

INTEGRAL Likelihood = -3.56 Transmembrane 148 - 164 ( 143 - 166) 

INTEGRAL Likelihood = -0.48 Transmembrane 382 - 398 ( 382 - 398) 

Final Results 

bacterial membrane — Certainty=0 . 4248 (Affirmative) < suco. 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15963 GB:Z99124 phosphotransferase system (PTS) 

beta-glucoside-specif ic enzyme IIABC component [Bacillus subtilis] 
Identities = 175/447 (39%), Positives = 266/447 (59%), Gaps = 10/447 (2%) 

EYITLSKNIIKHLGGQNNINNVYHCQTRU^SLNDPTKVHLEQLKTLKEVXTWISGGQH 63 
+Y LSK+I++ +GG+ N+ V HC TRLRF+L+D K + QL+ L V ISG Q 
DYDKLSKDILQLVGGEENVQRVIHCMTRLRFNLHDNAKADRSQLEQLPGVMGTNI SGEQF 6 1 







Sbjct: 


2 




64 


Sbjct: 


62 




121 


Sbjct: 


120 




181 


Sbjct: 




Query: 


241 


Sbjct: 


239 


Query: 


300 


Sbjct: 




Query: 


360 


Sbjct: 


357 



GMIK L+AL + F + SQ +++L DG FYFLP+L+A+ +AA+K +NP +A 



+GP+G I G+YL+ +L +A A FL G 



ITA MGITEP +YGVN+ + P A+LIGG GG + G+ + V G++GLP + ++I 
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Query: 419 SHTSTHLFITMLIAVI ITVSTTAILTF 445 

T + I ++IA S +L F 

Sbjct: 417 GPTFIYAMIGLVIAFAAGTSAAYLLGF 443 

There is also homology to SEQ ID 2884. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 948 

A DNA sequence (GBSxl006) was identified in S.agalactiae <SEQ ID 2885> which encodes the amino 
acid sequence <SEQ ID 2886>. This protein is predicted to be gamma-glutamyl kinase (proB). Analysis of 
this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.11 Transmembrane 160 - 176 ( 160 - 176) 

Final Results 

bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Cert ainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKRHFETTRHIVIKVGTSSLVQTSGKINLSKIDHLAFVISSLr«JRGMEVILVSSGAMGFG 60 

MKR+F++ +R+VIK+GTSSLV SGKINL KID LAFVISSL N+G+EV+LVSSGAMGFG 
Sbjct: 1 MKRI<TFDSVKRLVIK1GTSSLVLPSGKINLEKIDQLAFVISSLHNKGIEWLVSSGAMGFG 60 

Query: 61 LDILKMDKRPQEISQQQAVSSVGQVAMMSLYSQIFSHYQTHVSQILLTRDVWFPESLQN 120 

L++L ++KRP E+ +QOAVSSVGQVAMMSLYSQ+FSHYQT VSQ+LLTRDW + ESL N 
Sbjct: 61 LNVLDLEKRPAEVGKQQAVSSVGQVAMSLYSQVFSHYQTKVSQLLLTRDVVEYSESLAN 120 

Query: 121 VTNSFESLLSMGILPIVNENDAVSVDEMDHKTKFGDNDRLSAWAKITKADLLIMLSDID 180 

N+FESL +G++PIVNENDAVSVDEMDH TKFGDNDRLSA+VAK+ ADLLIMLSDID 
Sbjct: 121 AINAFESLFELGWPIVNEm3AVSVDEMDHATKFGDNDRLSAIVAKWGADLLIMLSDID 180 

Query: 181 GLFDKNPNIYDDAVLRSHVSEITDDIIKSAGGAGSKFGTGGMLSKIKSAQMVFDNNGQMI 240 

GLFDKNPN+Y+DA LRS+V EIT++I+ SAGGAGSKFGTGGM+SKIKSAQMVF+N QM+ 
Sbjct: 181 GLFDKNPNVYEDATLRSYVPEITEEILASAGGAGSKFGTGGMMSKIKSAQMVFENQSQMV 240 

Query: 241 LMNGANPRDILKVLDGHNIGTYFAQ 265 

LMNG NPRDIL+VL+G IGT F Q 
Sbjct: 241 LMNGENPRD I LRVLEGAKIGTLFKQ 265 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2887> which encodes the amino acid 

sequence <SEQ ID 2888>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.97 Transmembrane 163 - 179 ( 163 - 179) 
INTEGRAL Likelihood = -0.06 Transmembrane 124 - 140 ( 124 - 140) 

Final Results 

bacterial membrane — Certainty=0. 1786 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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>GP:CAA63147 GB:X92418 gamma-glutamyl kinase [Streptococcus thermophilus] 
Identities = 212/265 (80%), Positives = 237/265 (89%) 



Query: 




St* at. 




Query. 




Sb D ct: 


61 




124 


Sbjct: 


121 




184 


Sbj ct : 


181 




244 


Sbjct: 


241 



MKR F+ V R+VIKIGTSSLVLP+GKINL3KIDQLAFVISSL NKG EV+LVSSGAMGFG 



L++L +EKRP 4- KQQAVSSVGQVAMMSLYSQ+F++YQT VSQ+LLTRDW + ESLAN 



NAFESL LG+VPIVNENDAVSVDEMDHATKFGDNDRLSA+VA + ADLLIMLSDID 



GLFDKNP +YEDA LRS+V IT+EI+ASAGGAGSKFGTGGM+SK++SAQMVFEN+ QMV 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/265 (81%) , Positives = 242/265 (90%) 



-++QQA.VSSVGQVAMMSr,YSQIF++YQT+VSQII J LTRDVVVFEESL K 



Query: 


1 


Sbjct: 






61 


Sbj ct : 


64 


Query: 




Sbjct: 


124 




181 


Sbjct: 


184 




241 


Sbjct: 





GLFDKNP IY+DA LRSHV+ IT +11 SAGGAGSKFGTGGMLSK++SAQMVF+N GQM+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 949 

A DNA sequence (GBSxl007) was identified in S.agalactiae <SEQ ID 2889> which encodes the amino 
acid sequence <SEQ ID 2890>. This protein is predicted to be unnamed protein product (proA). Analysis of 
this protein sequence reveals the following: 

3 N-terrainal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3517 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 289 1> which encodes the amino acid 
sequence <SEQ ID 2892>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 



The protein has homology with the following sequences in the databases: 

>GP:CAA63148 GB:X92418 gamma-glutamyl phosphate reductase 
[Streptococcus thermophilus] 
Identities = 309/416 (74%) , Positives = 355/416 (85%) 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 






Sbjct: 


241 




301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 



S IM DRL LT +RI IA4GV+QVADL DPIGQV++GYTNLDGLKI +QKRVP+GVIAMI 



FESRPNVS+DAFSIAFKTNNAIILRGG+DA++SNKALV + R++L+ +GIT DAVQ VKD 



3 VAEELM AT YVD+LIPRGGA+LIQTVKEKAKVPVIETGVGN HIYVD A+LD+A 



YT+ HSEAI+T+DI AE FQD \ 



iSTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYINGQGQIRE 416 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 307/417 (73%) , Positives = 353/417 (84%) , Gaps = 1/417 (0%) 

MTYIEILGQNAKKASQSVARLSTASKNEILRDLARNIVADTETILTENARDWKAKDNGI 6 0 
MT + LGQ AK+AS +A LST KN L LA+ +V DT+T+L N +D+ AK++GI 
MTDMRRLGQRAKQASLLIAPLSTQIKNRFLSTLAKALVDDTQTLLAANQKDLANAKEHGI 6 0 

SEIMVDRLRLNKDRIQAIANGIYQVADIMPIGQWSGYTNLDGLKILKKRVPLGVIAMI 12 0 
S+IM+DRLRL +RI+AIA G+ QVADLADPIGQV-r GYTNLDGLKIL+KRVPLGVIAMI 



FESRPNVSVDAFSLAFKTGNAIILRGGKDAIFSNTALWCMRQTLQDTGHNPDIVQLVED 180 
FESRPNVSVDAFSLAFKT NAIII1RGGKDA+ SN ALV 4-RQ+L+ +G PD VQLVED 
FESRPNVSVDAFSLAFKTNNAIILRGGKDALHSNKALVKLIRQSLEKSGITPDAVQLVED 180 



KIVINAKT+RPSVCNAAEGLV+H+A+A F+ LEK + + Q VE+RAD++AL L E 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 




Sbjct: 


121 




181 


Sbjct: 


181 




241 
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Sbjct: 241 TKIVINAKTKRPSVCNAAEGLVIHEAVAARF I PMLEKAINQV- QPVEWRADDKALPLFEQ 299 

Query: 301 AVAASESDYATEFLDYIMSVKVVDSFEQAISWINKYSSHHSEAIITNNISRAEIFQDMVD 360 
AV A D+ TEFLDYIMSVKW S E+AISWIN+Y+SHHSEAIIT +1 AE FQD+VD 
5 Sbjct: 300 AVPAKAEDFETEFLDYIMSVKOTSSLEFAISWINQYT2HHSEAIITRDIKAAETFQDLVD 359 

Query: 361 AAAVYVNASTRFTDGFVFGLGAEIGISTQKLHARGPMGLEALTSTKYYINGTGQVRE 417 

AftAVYVNASTRFTDGFVFGLGAEIGISTQK+HARGPMGLEALTSTK+YING G +RE 
Sbjct: 360 AAAVYWASTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYINGDGHIRE 416 

10 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 950 

A DNA sequence (GBSxl008) was identified in S.agalactiae <SEQ ID 2893> which encodes the amino 
15 acid sequence <SEQ ID 2894>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terrainal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0. 1859 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 953 1> which encodes amino acid sequence <SEQ ID 9532> 
25 was also identified. 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2895> which encodes the amino acid 
sequence <SEQ ID 2896>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 0853 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 259/315 (82%) , Positives = 287/315 (90%) 





Query: 


1 


MTNDFHHITVLLHETVDMLDIKPDGIYVDATLGGAGHSEYLLSQLGPDGHLYAFDQDQKA 


60 


40 






MT +FHH+TVLLHETVDMLDIKPDGIYVDATLGG+GHS YLLS+LG +GHLY FDQDQKA 






Sbj ct : 


22 


MTKEFHHVTVLLHETVDMLDIKPDGIYVDATLGGSGHSAYLLSKLGEEGHLYCFDQDQKA 


81 




Query: 


61 


IDNAHIRLKKYVDTGQVTFIKDNFRNLSSNLKALGVSEINGICYDLGVSSPQLDERERGF 


120 








IDNA + LK Y+D GQVTFIKDNFR+L + L ALGV EI+GI YDLGVSSPQLDERERGF 




45 


Sbjct: 


82 


IDNAQVTLKSYIDKGQVTFIKDNFRHLKARLTALGVDEIDGILYDLGVSSPQLDERERGF 


141 



Query: 121 SYKQDAPLDMRMNREQSLTAYDVVNTYSYHDLVRIFFKYGEDKFSKQIARKIEQVRAEKT 180 

SYKQDAPLDMRM+R+ LTAY+WMTY ++DI1V+IFFKYGEDKFSKQIARKIEQ RA K 

Sbjct: 142 SYKQDAPLDMRJTORQSLLTAYEVVOTYPFNDLVKIFFICYGEDKFSKQIARKIEQARAIKP 201 

Query: 181 ISTTTEIAEIIKSSKSAKELKKKGHPAKQIFQAIRIEVNDELGAADESIQQAMDIjLAVDG 240 

I TTTELAE+IK++K AKELKKKGHPAKQI FQAIRIE VNDELGAADES IQ AM+LLA+DG 

Sbjct: 202 IETTTEIAELIKAAKPAKELKKKGHPAKQIFQAIRIEVITOELGAADESIQDAMELLAIjDG 261 

Query: 241 RISVTTFHSLEDRLTKQLFKEASTVEVPKGIiPFIPDDLQPKMELVNRKPILPSQEErjEAN 300 

RISVITFHSLEDRLTKQLFKEASTV+VPKGLP IP+D++PK ELV+RKPILPS EL AN 

Sbjct: 262 RISVITFHSLEDRLTKQLFECEASTVDVPKGLPLIPEDMKPKFELVSRKPILPSHSELTAN 321 
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Query: 301 NRAHSAKLRVARRIR 315 

RAHSAKLRVA++IR 
Sbjct: 322 KRAHSAKLRVAKKIR 335 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 951 

A DNA sequence (GBSxl009) was identified in S.agalactiae <SEQ ID 2897> which encodes the amino 
acid sequence <SEQ ID 2898>. This protein is predicted to be FtsL. Analysis of this protein sequence 
10 reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.92 Transmembrane 30 - 46 ( 24 - 49) 

15 Final Results 

bacterial membrane — Certainty=0 .4567 (Affirmative) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95455 GB-.AF068903 Y11D [Streptococcus pneumoniae] 
Identities = 44/99 (44%) , Positives = 71/99 (71%) 

Query: 5 KRTBAVTCjTI^RHIKTFSRIEKAFyGAIVITAII^VGIIYIXJSNSLQVKQEWQLNSKI 64 
25 4+ E Q LQ +K FSR+EKAFY +1 +T +I+A+ II++Q+ LQV+ ++ ++N++I 

Sbjct: 3 BKMEKTGQILQMQLKRFSRVEKAFYFSIAVTTLIVAISIIFMQTKLLQVQNDLTKINAQl 62 

Query: 65 NDKQTEFDNAKQEVNELSNRDRITKIAKDAGLTIQNDNI 103 
+K+TE D+AKQEVNED +R+ +IA I. + N+NI 
30 Sbjct: 63 EEKKTELDDAKQEVNELLRAERLKEIANSHDLQLNNENI 101 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2899> which encodes the amino acid 

sequence <SEQ ID 2900>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
35 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.79 Transmembrane 40 - 56 ( 37 - 58) 

Final Results 

bacterial membrane — Certainty=0 .3314 (Affirmative) < suco 
40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC95455 GB:AF068903 YllD [Streptococcus pneumoniae] 
45 Identities = 45/94 (47%) , Positives = 69/94 (72%) 

Query: 24 LQKRIKTFSRIEKAFYTAIIVTAITMAVSIIYLQSRKLQLQQEITSLNSHISDQKLELNN 83 

LQ ++K FSR+EKAFY +1 VT + +A+SII++Q++ LQ+Q ++T +N+ I ++K EL++ 
Sbjct: 12 LQMQLKRFSRVEKAFYFSIAVTTLIVAISIIFMQTKLLQVQNDLTKINAQIEEKKTELDD 71 

50 

Query: 84 AKQEVNELSRRDRI IDIAGKAGLSNRNNNIKKVE 117 

AKQEVNEL R +R+ +IA L N NI+ E 
Sbjct: 72 AKQEVNELLRAERLKE IANSHDLQtiNNENIRIAE 105 

55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/108 (65%) , Positives = 87/108 (79%) , Gaps = 1/108 (0%) 



WO 02/34771 



-1044- 



PCT/GB01/04789 



Query: 1 MTNEKRTEftVTQTLQRHIKTFSRIEKAFYGAIVITAIIMAVGIIYLQSNSLQVKQEVNQL 60 

MTNEKRT+ VT LQ+ IKTFSRIEKAFY AI++TAI MAV IIYLQS LQ++QE+ L 
Sbjct: 11 MT1TOKRTQVVTNALQKRIKTFSRIEKAFYTAIIVTAITMAVSIIYLQSRKLQLQQEITSL 70 

5 Query: 61 NSKIiTOKQTEFDNAKQEVNELSNRDRlTKIAKnAGLTIQNDNIYRKVD 108 

NS I+D++ E +NAKQEVNELS RDRI IA AGIrt- +N+NI +KV+ 
Sbjct: 71 MSHISDQECLEIMAKQEVNELSRRDRIIDIAGKAGLSintN^I-KKVE 117 

SEQ ID 2898 (GBS82) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
1 0 extract is shown in Figure 15 (lane 2; 2 bands). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 952 

A DNA sequence (GBSxlOlO) was identified in S.agalactiae <SEQ ID 290 1> which encodes the amino 
15 acid sequence <SEQ ID 2902>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1435 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 953 

A DNA sequence (GBSxlOll) was identified in S.agalactiae <SEQ ID 2903> which encodes the amino 
30 acid sequence <SEQ ID 2904>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 
Possible site: 47 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.90 Transmembrane 37 - 53 ( 30 - 60) 

35 

Final Results 

bacterial membrane Certainty=0. 6562 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2905> which encodes the amino acid 
sequence <SEQ ID 2906>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

45 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.06 Transmembrane 33 - 49 ( 24 - 53) 

Final Results 

50 bacterial membrane Certainty=0. 6222 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 4B0/753 (63%) , Positives = 603/753 (79%) , Gaps = 8/753 (1%) , 

Query: 5 KKLKKIFLDYVIHIRDRRSPQKNRERVGQNIjMILTI FLFFI FI INFVI IVGTDSKFGVNL 64 

KK +K LDYV+ RDRR+P +NR RVGQN+M+LTIF+FFIFIINF+II+GTD KFGV+L 
Sbjct: 2 KKWQKYVLDYVV- -RDRRTPVENRVRVGQNMMLLTIFI FFI FI INFMI 1 IGTDQKFGVSL 59 

Query: 65 SKEAKKVYQQSMTVQAI<RGTIYDRNGNPIAEDATTYSIaYAIISKNYTTATGQKLYVQPSQ 124 

S+ AKKVYQ+++T+QAKRGTIYDRNG IA D+TTYS+YAH- K++ +A+ +KLYVQPSQ 
Sbjct: 60 SEGAK1WYQETVTIQAKRGTIYDRNGTAIAVDSTTYSIYAILDKSFVSASDEKLYVQPSQ 119 

Query: 125 YEKVASILENKLGMKKNLVLKQLNQKKLFQVSFGSSGSGLSYTKMADIKKTMEKSDIKGI 184 

YE VA IL+ LGMKK V+KQL +K liFQVSFG SGSG+SY+ M+ I+K ME + IKGI 
Sbjct: 120 YETVADILKKHLGMKKTDVIKQLKRKGLFQVSFGPSGSGISYSTMSTIQKAMEDAKIKGI 179 

Query: 185 GFSTSPGRIYPNGIFASQFIGF-TLPQDDGDG-KKLVGNTGLEAALNKVLSGTDGKVTYE 242 

F+TSPGR+YPNG FAS + FIG +L +D G K LVG TGLEA+ 4K+LSG DG +TY+ 
Sbjct: 180 AFTTSPGRMYPNGTFASEFIGLASLTEDKKTGVKSLVGKTGLEASFDKILSGQDGVITYQ 239 

Query: 243 KDRSGNVLLGTATTERRAVNGKDIYTTLSEPIQTVLETQMDVFAEKTKGKFASATVVNAK 302 

KDR+G LLGT T ++A++GKDIYTTLSEPIQT LETQMDVF K+ G+ ASAT+VNAK 
Sbjct: 240 KDRNGTTLLGTGKTVJCKAIDGKDIYTTLSEPIQTFLETQMDVFQAKSNGQLASATLVNAK 299 

Query: 303 TGEIIATSQRPTYNPSTLKGYDKKNLGTYOT?LI,YDNFFEPGSTMKVMTIiASAIDSKHFNS 362 

TGEIIAT+QRPTYN TLKG + N Y+ L N FEPGSTMKVMTLA+AID K FN 
Sbjct: 300 TGEILATTQRPTYNADTLKGLE^^^NYKWYSALHQGN-FEPGSTMKVMTLAAAIDDKVFNP 358 

Query: 3 63 TEVYNSAQ-YKIADAIIRDTOVMGLSSGSYMTFPQGFAHSSWGMVTLEQKMGRDKWLN 421 

E +++A I ADA I+DW +NEG+S+G YM + QGFA SSNVGM LEQKMG KW+N 
Sbjct: 359 NETFSNANGLTIADATIQDWSINEGISTGQYMNYAQGFAF 418 

Query: 422 YLSKFKFGYPTRFGMLHESGGLFPSDNEVTIAMSSFGQGIGVTQVQMLRAFTSISNDGVM 481 

YL+KF+FG+PTRFG+ E G+FPSDN VT AMS+FGQGI VTQ+QMLRAFT+ISN+G M 
Sbjct: 419 YLTKFRFGFPTRFGLKDEDAGIFPSDNIVTQAMSAFGQGISVTQIQMLRAFTAISNNGEM 478 

Query: 482 LQPQFISSIYDPNTGTSRTARKEWGKPVSKEAASKTRDYMVTVGTDPYYGTLYA-AGAP 540 

L+PQFIS IYDPNT + RTA KE+VGKPVSK+AAS+TR YM+ VGTDP +GTLY+ P 
Sbjct: 479 LEPQFISQIYDPNTASFRTANKEIVGKPVSKKAASETRQYMIGVGTDPEFGTLYSKTFGP 538 

Query: 541 VIQVGNQSVAVKSGTAQIAQEGGGGYLQ-GKNDTINSWAMVPSENPDFIMYVTIQQPEK 599 

+ I+VG+ VAVKSGTAQI E G GY G + + SWAMVP++ PDF+MYVT+ +P+ 
Sbjct: 539 IIKVGDLPVAVKSGTAQIGSEDGSGYQDGGLTl^'VYSWAIWPADKPDFLMYVTMTKPQH 598 

Query: 600 FSITFWKDWMPVLEQATAMKETILKPGLNDSEHQTKYKLSKIVGENPGHVAEELRRNLV 659 

F FW+DVVNPVLE+A M++T+ KP ++D+ QT YKL VG+NPG + ELRRNLV 
Sbjct: 599 FGPLFWQDWMPVLEEAYLMQDTLTKPWSDANRQTTYKLPNFVGKNPGETSSELRRNLV 658 

Query: 660 QPIILGNGSK^SKVSKRPGANLAEfmQIjLVLTOKLTELPDMYGWSKANVEQFAKWTGIKV 719 

QP++LG GSK+ KVS +PG L EH+Q+L+L+++ E+PDMYGW+K+NV+ FAKWTGI + 
Sbjct: 659 QPWLGTGSKIKKVSHQPGQTLTENQQVLILSDRFVEVPDMYGWTKSNVKTFAKWTGIDI 718 

Query: 720 TYKGSTSGKVRKQSIDVGKSINKIKKIKITIGD 752 

++KG+ SG+V KQS+DVGKS+ KIKK+ IT+GD 
Sbjct: 719 SFKGTDSGRVMKQSVDVGKSLKKIKKMTITLGD 751 

A related GBS gene <SEQ ID 8691> and protein <SEQ ID 8692> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -4.31 
GvH: Signal Score (-7.5): -7.07 

Possible site: 47 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -13.90 threshold: 0.0 
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INTEGRAL Likelihood =-13.90 Transmembrane 37 - 53 ( 30 - 60) 
PERIPHERAL Likelihood = 5.30 450 
modified ALOM score: 3.28 



*■ Reasoning Step: 3 . 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0.65S2 (Affirmative) < succ: 
•- Certainty=0. 0000 (Not Clear) < suco 
■- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF0041K301 - 2556 of 2856) 

GP|677911l|emb|CAH70457.l| |A94911(1 - 752 of 752) unnamed protein product {unidentified}, 
15 homology to penicillin-binding protein 2x (S. pneumoniae) 

%Match = 77.4 

^Identity =99.7 %Similarity =99.9 

Matches = 750 Mismatches = 1 Conservative Sub.s = 1 

20 66 96 126 156 186 216 246 276 

RIEKAPYGAIVITAIIMAVGIIYLQSNSLQWQEVNQLMSKINDKQTEFDNAKBEV^LSNRDRITKIAKDAGLTIQNDN 



306 336 366 396 426 456 486 516 

IYRKVD*SVTFFKKLKKIFLDWIHIRDRRSPQKNRERVGQNLMILTIFLFFIFIINFVIIVGTDSKFGVmjS]^KKVY 

25 MMMMMMMMIIIIMIIMMIIIIIMMMMMMIIIIIMMIIIIIIIIMIMIMM 

VTFFKKLKK1FLDYVIHIRDRRSPQKHRERVGQNLMILTIFLFFIFIINFV1IVGTDSKFGWLSKEAKKVY 
10 20 30 40 50 60 70 

546 576 606 636 666 596 726 756 

30 QQSMTVOAKRGTIYDRNGNPIAEDATTYSLYAI I SK^TfTTATGQKLYVQPSQYEKVASILENKIjGMKKI^VLKQIiNQKKIj 

iiiiiiiiiiiiimimiiiiiiimiiiiiiiiiimmmimiiiiimiiiiiiiiiiiiiiimi 

QQSMTVQAKRGTIYDRNGNPIAEDATTYSLYAIISK]sr^TTATGQKLWQ?8QyEKA/ASILENKI/]}MKK^ILVLKQLNQK^ 
90 100 110 120 130 140 150 



35 786 816 846 876 906 936 966 996 

FQVSFGSSGSGLSYTKMADIKKTMEKSDIKGIGFSTSPGRIYPNGIFASQFIGFTLPQDDGDGKKLVGNTGLFAALNKVL 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FQVSFGSSGSGLSYTKMADIKKTMEKSDIKGIGFSTSPGRIYPNGIFASQFIGFTLPQDDGDGKKLVGMGLFJWUjNKVL 
170 180 190 200 210 220 230 

40 

1026 1056 1086 1116 1146 1176 1206 1236 

SGTDGKVTYEKDRSGNVLLGTATTERRAWGKDIYTTLSEPIQTVXETQMDVFAEKTKGKFASATWNAKTGEIIATSQR 

miimiMMiiiimimimiimiiMiiiimiiiiiimiiiiiiiiiiiiiiiiiiimiiiii 

SGTDGKVTYEKDRSGNVLLGTATTERPATOGKDIYTTLSEPIQTVLETQMDVFAEKTKGKFASATVVNAKTGEILATSQR 
45 250 260 270 280 290 300 310 



1266 1296 1326 1356 1386 1416 1446 1476 

PTYNPSTLKGYDKKNLGTYNTLLYDNFFEPGSTMKVMTIASAIDSKHFNSTEVYNSAQYKIADAIIRDTOVIffiGLSSGSY 

mimmmiim ii miiiiiiii ii iiiiiiiimmmiim iii i ii minimi iimiim 

50 PTYNPSTLKGYDKraLGTYOTLLYDNFFEPGSTMromTIASAIDSKHFNSTEVYNSAQYKIADAVIRDTOVNEGLSSGSY 
330 340 350 360 370 380 390 



1506 1536 1566 1596 1626 1656 1686 1716 

mFPQGFAHSSNWSMWLEQKMGRDKWIOTLSKFKFGYPTRFGMLHESGGL 

55 MMMMMMMMMMIMMMMMMMMMIIIMMMMMMMMIMMMIMMMMMMM 

MTFPC^FAHSSNVGMUTLEQKMGRDKWLNYLSKFKFGYPTRFGMLHESGGLFPSDNEVTIAMSSFGO/3IGVTQVQMLRAF 
410 420 430 440 450 460 470 



1746 1776 1806 1836 1866 1896 1926 1956 

60 TSISNDGVMLQPQFISSIYDPNTGTSRTARKEWGKPVSKFJ\ASICTRDYMvTVGTDPYYGTLYAAGAPVIQVGNQSVAVK 

I M M 1 1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M I II I II Ml IM 1 1 1 M I M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M I II 1 1 II I M I 

TSISNDGVMLQPQFISSIYDPOTGTSRTARKEWGKPVSKEAASKTRDYMVTVGTDPYYGTLYAAGAPVIQVGNQSVAVK 
490 500 510 520 530 540 550 



65 



1986 2016 2046 2076 2106 2136 2166 2196 

SGTAQIAQEGGGGYLQGKiroTINSWAMVPSENPDFIMYOTIQQPEKFSITFW^ 

IMMIIMIIMIIIIIIMIIIIIIIIIIIIIIIIIIillllllllllllMIIMIIIIIIIIIIIMIM Mill 



WO 02/34771 



-1047- 



PCT/GB01/04789 



SGTAQIAQEGGGGYLQGKNDTINSWAWPSENPDFIMYVTIQQPEKFSIT^ 



570 



580 



590 



600 



610 



620 



630 



2226 2256 2286 2316 2346 2376 2406 2436 

HQTKYKLSKIVGENPGHVAEELRFJSILVQPIILGNGSKVSKVSKRPGAISn^ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 

HQTKYKLSKIVGENPGOTTAEELRRNLVQPIILGNGSKVSKVSKRPGA^ 



650 



670 



680 



690 



700 



710 



2466 2496 2526 2556 2586 2616 2646 2676 

KWTGIKOTYKGSTSGKVRKQS1DVGKSINKIKKIKITIGD*HV 

imiiiimiiimiiiiiiiiiiimmiiiiii 

KWTGIKVTYKGSTSGKVRKQS1DVGKSINKIKKIKITIGD 



730 



740 



750 



SEQ ID 8692 (GBS352d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 145 (lane 15 & 16; MW 105.5kDa). It was also expressed in E.coli as a His- 
fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 145 (lane 17 & 18; MW 
80.5kDa), in Figure 182 (lane 3; MW 80kDa) and in Figure 185 (lane 4; MW 105kDa). Purified 
GBS352d-GST is shown in lane 5 of Figure 236. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines o 



Example 954 

A DNA sequence (GBSxl012) was identified in S.agalactiae <SEQ ID 2907> which encodes the amino 
acid sequence <SEQ ID 2908>. Analysis of this protein sequence reveals the following: 



N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 1950 (Affirmative) < suoo> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 



Based on this analysis, it \ 
vaccines or diagnostics. 



3 predicted that this protein and its epitopes, could be useful antigens for 



Example 955 

A DNA sequence (GBSxl013) was identified in S.agalactiae <SEQ ID 2909> which encodes the amino 
acid sequence <SEQ ID 2910>. This protein is predicted to be unnamed protein product (rnraY). Analysis 
of this protein sequence reveals the following: 



Possible site: 18 

»> Seems to have a cleavable N-term signal seq. 
Likelihood =-15. i: 
Likelihood =-14.71 
Likelihood = -6.6! 
INTEGRAL Likelihood = -6.64 Transmembrane 
INTEGRAL Likelihood = -5.52 Transmembrane 
INTEGRAL Likelihood = -5.31 Transmembrane 
INTEGRAL Likelihood = -3.08 Transmembrane 
INTEGRAL Likelihood = -2.87 Transmembrane 
INTEGRAL Likelihood = -2.34 Transmembrane 



5S - 72 ( 47 - 76) 

203 - 219 ( 198 - 223) 

31B - 334 ( 315 - 335) 

83 - 99 ( 79 - 103) 

179 - 195 ( 175 - 197) 

232 - 248 ( 230 - 249) 

119 - 135 ( 119 - 137) 

151 - 167 ( 147 - 167) 

254 - 270 ( 254 - 270) 
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■ Final Results 

bacterial membrane Certainty=0. 7050 (Affirmative) . 

bacterial outside Certainty=0 . 0000 (Not Clear) < ! 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



A related DNA sequence was identified in S.pyogenes <SEQ ID 291 1> which encodes the amino acid 
sequence <SEQ ID 2912>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 293 - 309 



■ Certainty=0. 4821 (Affirmative) • 
- Certainty=0. 0000 (Not Clear) < : 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 



The protein has homology with the following sequences in the databases: 

>GP:CAB70458 GB:A94911 unnamed protein product [unidentified] 
Identities = 244/309 (78%), Positives = 273/309 (87%), Gaps = 1/309 (0%) 

LKKIG3QQMHEDVKQHLAKAGTPIMGGTVFLLVATAVSLLVSLF-SIKNTQSLALISGIL 5 9 
LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFL+VA VSL+ S+ S +N+ +L GIL 
LKKIGGQQMIIEDWQHIiAKAGTPTMGGTVFLWALLVSLIFSIILSKENSGKLGATFGIL 87 



L +G Y FFVLFWWGFSNAVNLTDGIDGLASISWISL+TYG+IAY Q+QFD+LL+I 



MIGALLGFF FNHKPAKVFMGDVGSLALGAMLAAI S I ALRQEWTLL IG VYV ETSS 



3KGKKWSEWQVDAFLWGVGSLAS 299 
VMLQV+YFKYTKKK G G+RIFRMTPFHHHLELGG+SGKG KWSEW+VDAFLW +G S 
VMLQVAYFKYTKKKTGVGKRIFRKTPFHHHLELGGVSGKGNKWSEWKVDAFLWAIGIFMS 327 



Query: 


1 


Sb j ct : 


28 




60 


Sbjct: 






120 


Sbjct: 


148 




180 


Sbjct: 


208 




240 


Sb j ct : 


268 




300 


Sbjct: 


328 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/309 (78%) , Positives = 273/309 (87%) , Gaps = 1/309 (0%) 

Query: 28 LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFLIVALLVSLIFSIILSKENSGNLGATFGIL 87. 

LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFL+VA VSL+ S+ S +N+ +L GIL 
Sbjct: 1 LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFLLVATAVSLLVSLF-SIKNTQSLALISGIL 59 

Query: 88 SWLIYGIIGFLDDFLKIFKQINEGLTPKQKMSLQLIAGLIFYFVHVLPSGTSAINIFGF 147 

S +V+ 1 YGI IGFLDDFLKI FKQINEGLT KQK++LQL+ GL+FYF+HV PSG S+IN+FG+ 
Sbjct: 60 SIWIYGIIGFLDDFLKIFKQINEGLTAKQKIALQLVGGLMFYFLHVSPSGISSINVFGY 119 
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Query: 148 YLEVGYLYAFFVLFWWGFSNAWLTDGIDGLASISWISLITYGIIAYNQTQFDILLII 207 

L +G Y FFVLFWWGFSNAVNLTDGIDGLASISWISL+TYG+IAY Q+QFD+LL+I 
Sbjct: 120 QLPLGIFYLFFVLFWWGFSNAWLTDGIDGLASISWISLVTYGVIAYVQSQFDVLLLI 179 

Query: 208 VIMIGALLGFWFNHKPAKVFMGDVGSIJUiGAMLAAISIALRQEWTLLFIGFVYVFETSS 267 

MIGALLGFF FNHKPAKVFMGDVGSLALGAMLAAI S IALRQEWTLL IG VYV ETSS 
Sbjct: 180 GAMIGALI^FFCFIfflKPAKVFMGDVGSLALGAiyD^lSIALRQEWTLLIIGIVYVLETSS 239 

Query: 268 VMLQVAYFKYTKKKTGVGKRIFRMTPFHHHLELGGVSGKGNKWSEWKVDAFLWAIGIFMS 327 

VMLQV+YFKYTKKK G G+RIFRMTPFHHHLELGG+SGKG KWSEW+VDAFLW +G S 
Sbjct: 240 VMLQVSYFKYTKKKYGEGRRIFRMTPFHHHLELGGLSGKGKKWSEWQVDAFLWGVGSLAS 299 

Query: 328 AITLAIBYL 336 

+ LAILY+ 
Sbjct: 300 LLVLAILYV 308 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 956 

A DNA sequence (GBSxlOH) was identified in S.agalactiae <SEQ ID 2913> which encodes the amino 
acid sequence <SEQ ID 2914>. This protein is predicted to be autoaggregation-mediating protein (deaD). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .3018 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14444 GB:Z99116 similar to ATP-dependent RNA helicase 
[Bacillus subtilis] 
Identities = 215/436 (49%), Positives = 310/436 (70%), Gaps = 5/436 (1%) 



+D + D VQWITAP+REL QIYQ +1 + E +IR ++GGTDK + I+KLK+ 





Query: 


3 




Sbjct: 


6 


40 




63 




Sbjct: 


66 


45 




122 




Sbjct: 


125 






182 


50 


Sbjct: 


185 




Query: 


242 


55 


Sbjct: 


245 






302 




Sbjct: 


305 


60 




362 



QPH+V+GTPGRI DL+K L++HKA 4- V+DEAD+ LDMGFL VD I 



VFSATTP+KL+PFLKKY+ NP ++ V A I++ L+ +K RDK+ 



j ++F NTK AD + YL+ G+K+ +HGG+ PRERK++M Q+ +LEF YI+ATDL 



AARGIDI+GVSHVIN +P DL F+VHRVGRT R G SG A+T+Y+ +D+ 



< KK+KPGYKKK+ ++++ 
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Sbjct: 365 IEFEYLELEKGEMKKGDDRQRRKKRKKTPNEAD-EIAHRLVKKPKKVKPGYKKKMSYEME 423 

Query: 422 EKRRKERRASNRAKGR 437 

+ ++K+RR N++K R 
Sbjct: 424 KIKKKQRR- -NQSKKR 437 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2915> which encodes the amino acid 
sequence <SEQ ID 2916>. Analysis of this protein sequence reveals the following: 

a N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2315 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 382/447 (85%) , Positives = 420/447 (93%) 

Query: 1 MSFKDFNFKPYIQRALDELKFVDPTDVQAKLIPWRSGRDLVGESKTGSGKTHTFLLPIF SO 

MSFKD++FK Y+Q+AL+E+ FV+PT+VQ +LIP+V SGRDLVGESKTGSGKTHTFLLPIF 
Sbjct: 1 MSFKDYHFKQYVQQALEEIGFVNPTEVQKRLIPIVNSGRDLVGESKTGSGKTHTFLLPIF 60 

Query: 61 EKLDESSDDVQWITAPSRELGTQIYQATKQIAEHSEQEIRVVNYVGGTDKLRQIEKLKV 120 

EKLDE+ +VQWITAPSREL TQI+ A KQIA+H ++EIR+ NYVGGTDKLRQIEKLK 
Sbjct: 61 EKLDEAKAEVQWITAPSREIATQIFDACKQIAKHFQEEIRIJU^WGGTDKLRQIEKLKI) 120 

Query: 121 SQPHIVIGTPGRIYDLVKSGDLAIHKAHTFVVDEADMTLDMGFLDTVDKIAGSLPKDVQI 180 

SQPHIVIGTPGRIYDLVKSGDLAIHKA TFWDEADMT+DMGFLDTVDKIA SLPK VQI 
Sbjct: 121 SQPHIVIGTPGRIYDLVKSGDLAIHKATTFVVDEADMTMDMGFLDTVDKIAASLPKSVQI 180 

Query: 181 LVFSATIPQKLQPFLKKYLTlWV^KIKTATOIADTIDISMLLSTKGRDKNAQIIiELSKLM 240 

LVFSATIPQKLQPFLKKYLTNPV+E+IKT TVIADTIDNWL+STKGRDKN Q+LE+ K M 
Sbjct: 181 LVFSATIPQKLQPFLKXYLTNPVIEQIKTKTVIADTIDNWLVSTKGRDKNGQLLEILKTM 240 

Query: 241 QPYLAMIFVNTKERADELHSYLSSNGLKVAKIHGGIAPRERKRIMNQVKNLEFEYIVATD 300 

QPY+AM+FVNTKERAD+LH++L++NGLKVAKIHGGI PRERKRIMNQVK L+FEYIVATD 
Sbjct: 241 QPYMAMLFVNTKERADDLHAFLTANGLKVAKIHGGIPPRERKRIMNQVKKLDFEYIVATD 300 

Query: 301 LAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGRNGLSGTAITLYQPSDDSDIRELEKL 360 

IAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGKNG++GTAITLYQPSDDSDI+ELEK+ 
Sbjct: 301 LAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGRNGMAGTAITLYQPSDDSDIKELEKM 360 

Query: 361 GINFIPKVIKNGEFQDTYDRDRRNTOEKSYQKLDTEMIGLVKKKKKKIKPGYKKKIQWKV 420 

GI F PKV+KNGEFQDTYDRDRR NRSK+YQKLDTEMIGLVKKKKKK+KPGYKKKIQW V 
Sbjct: 361 GIAFTPKVLKNGEFQDTYDRDRRQTOEKAYQKLDTEMIGLVKKKKKfWKPGYKKKIQWAV 420 

Query: 421 DEKRRKERRASNRAKGRAERKAKKQSF 447 

DEKRRKERRA NRAKGRAERKAKKQ F 
Sbjct: 421 DEKRRKERRAENRAKGRAERKAKKQHF 447 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 957 

55 A DNA sequence (GBSxl015) was identified in S.agalactiae <SEQ ID 2917> which encodes the amino 
acid sequence <SEQ ID 2918>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 19 

>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



There is also homology to SEQ ID 2920. 

A related GBS gene <SEQ ID 8693> and protein <SEQ ID 8694> were also identified. Analysis of this 
protein sequence reveals the following: 

10 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 8.85 
GvH: Signal Score (-7.5): -1.77 

Possible site: 19 
>» Seems to have a cleavable N-term signal seq. 
15 ALOM program count: 0 value: 8.12 threshold: 0.0 

PERIPHERAL Likelihood = 8.12 182 
modified ALOM score: -2.12 



*** Reasoning Step: ; 



• Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) - 
bacterial membrane Certainty=0 . 0000 (Not Clear) < i 



25 The protein has homology with the following sequences in the databases: 

EGAdI 126750 | collagen binding protein Insert characterized 

GP|l617328|emb|CAA68052.l| |X99716 collagen binding protein Insert characterized 

ORF0018K331 - 1089 of 1410) 
30 EGAD| 126750 | 135177 (23 - 260 of 263) collagen binding protein {Lactobacillus 

reuteri}GP|l617328|emb|CAA63052.l| |X99716 collagen bindi 
ng protein {Lactobacillus reuteri} 
%Match =11.2 

%Identity =35.4 %Similarity =59.0 
35 Matches = 69 Mismatches = 77 Conservative Sub.s - 46 



KTKFLKLLKSEISSFQAFLLI *NLYHLIRKYYYTDRF* SVRLVI *YFRRILMFKKIILSIATIAATASLAVSVQASEKVE 
::: : | = : I = 11 11= =, II 

MKFWKKALLTIAALTVGTSAGITSVSAASSAVNSELVHKGE 



417 447 477 507 537 567 597 627 

LKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKOTFK 
45 | : : :|::|:|: |: |::||: ||| | | | =|:: |: :||||: |: : |||::| || | 

LTIGLEGTYSPYSYRKNNKLTGFEVDLGKAVAI<KMGLKANFVPTI<in3SLIAGLGSGKFDWMNNITQTPERAKQYNFSTP 
60 70 80 90 100 110 120 



657 687 717 747 

50 xsrsnyawgki<:gshykslsdlsgkstevlsgvnyaqvlenwnkn-hpn 

= 1 ••!■■= h III 1= II :| I I I- : I II 

YIKSRFALIVPTDSNIKSLICDIKGKKIIAGTGTtOTANVVKKYKGNLTPNGDFASSLDMIKC^RAAGTvNSRFAWYAYSKK 
140 150 160 170 180 190 200 

55 789 819 849 879 909 939 969 

KKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVKDQSLNLSVSPLKGKIGNNKDGLEY 

= I II : 

NSTKGLKMIDVSSEQDPAKISALF 

220 

60 

999 1029 1059 1089 1119 1149 1179 1209 

LLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQYFGGDYVSNIDK*ISETISFIFLHVRVLRDRITEIESLEKESRRN 

:|l =1 II =1 l = = :|l= =11-111 1 
NKKDTAIQSSYNKALKELQQDGTVKKLSEKYFGADITE 
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230 240 250 260 

SEQ ID 8694 (GBS8) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 2 (lane 5; MW 31kDa), Figure 63 (lane 2; MW 31.3kDa), Figure 66 (lane 2 & 3; 
5 MW 31kDa), in Figure 178 (lane 2; MW 31kDa), in Figure 179 (lane 3 & 4; MW 31kDa) and in Figure 180 
(lane 3; MW 31kDa). It was also expressed in E.coli as a GST-fusion product, with SDS-PAGE shown in 
Figure 66 (lanes 4 & 5; MW 56kDa) and in Figure 180 (lanes 4 & 5; MW 55kDa). 

GBS8-His was purified as shown in Figures 189 (lane 7), 21 1 (lane 3), 228 (lanes 4-5) and 230 (lanes 3-6). 
Purified GBS8-GST is shown in Figure 209, lane 6. 

10 The GBS8-His fusion product was purified (Figure 90A) and used to immunise mice (lane 2 product; 
12.9ug/mouse). The resulting antiserum was used for Western blot (Figure 90B), FACS (Figure 90C ), and 
in the in vivo passive protection assay (Table III). These tests confirm that the protein is iintnunoaccessible 
on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 958 

A DNA sequence (GBSxl016) was identified in S.agalactiae <SEQ ID 2921> which encodes the amino 

acid sequence <SEQ ID 2922>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 991 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 959 

A DNA sequence (GBSxl017) was identified in S.agalactiae <SEQ ID 2923> which encodes the amino 
acid sequence <SEQ ID 2924>. This protein is predicted to be probable amino-acid abc transporter 
permease protein in idh-deor inter. Analysis of this protein sequence reveals the following: 

35 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.62 Transmembrane 50 - 66 ( 41 - 74) 
INTEGRAL Likelihood = -0.90 Transmembrane 226 - 242 ( 226 - 242) 
INTEGRAL Likelihood = -0.53 Transmembrane 80 - 96 ( 80 - 96) 

40 

Final Results 

bacterial membrane --- Certainty=0. 5649 (Affirmative) < suco 
bacterial outside --- .Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB15985 GB:Z99124 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 90/224 (40%) , Positives = 137/224 (60%) , Gaps = 10/224 (4%) 

5 Query: 28 WKAVLDAIPSILERLPITLLLTVAGALFGLILALIFAWKINRVKILYPIQALFVSFLRG 87 

W+ ++ A P++++ LPITL + +A +F +1 LI A++ N++ +L+ + L++SF RG 
Sbjct: 6 WEFMISAFPTLIQALPITLFMAIAAMIFAIIGGLILALITKNKIPVLHQLSKLYISFFRG 65 

Query: 88 TPILVQLMLSYYGI PLFLKFLNQKYGFDWNINAI PASVFAITAFAFNEAAYTSETIRAAI 147 
10 P LVQL L YYG+P +++ + A AI + AAY +E RAA+ 

Sbjct: 66 VPTLVQLFLIYYGLPQLFPEMSK MTALTAAIIGLSLKNAAYLAEIFRAAL 115 

Query: 148 LSVDQGEIEAARSLGMTSAQVYRRVI I PNAAVVATPTLINTLIGLTKGTSLAFNAGIVEM 207 
SVD G++EA S+GMT Q YRR+I+P A A P NT IGL K TSLAF G++EM 
15 Sbjct: 116 NSVDDGQLEACLSVGMTKFQAYRRIILPQAIRNAIPATGNTFIGLLKETSLAFTLGVMEM 175 

Query: 208 FAQAQIMGGSDYRYFERYISVALVYWAVSFLIEQLGNAIERKMA 251 

FAQ ++ + +YFE Y++VA+VYW ++ + L + ER M+ 
Sbjct: 176 FAQGKMYASGNLKYFETYLAVAI VYWVLTI I YS I LQDLFERAMS 219 

20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2925> which encodes the amino acid 
sequence <SEQ ID 2926>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -7.27 Transmembrane 80 - 96 ( 74 - 104) 

INTEGRAL Likelihood = -1.06 Transmembrane 207 - 223 ( 207 - 223) 
INTEGRAL Likelihood = -0.90 Transmembrane 110 - 126 ( 110 - 126) 

Final Results 

30 bacterial membrane — Certainty=0. 3909 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9167> which encodes the amino acid sequence 
35 <SEQ ID 9168>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.27 Transmembrane 50 - 66 ( 44 - 74) 
INTEGRAL Likelihood = -1.06 Transmembrane 177 - 193 ( 177 - 193) 
40 INTEGRAL Likelihood = -0.90 Transmembrane 80- 96 ( 80- 96) 

Final Results 

bacterial membrane Certainty=0 . 391 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 212/267 (79%), Positives = 238/267 (88%) 

50 Query: 1 ^QFILTGGWSWYNNLVSQVPAGKLFSWKAVLDAIPSILERLPITLLLTVAGALFGLILA 60 

M LT GW++Y+ L+S +P GKLFSW AV DAIP+I-H+RLPITL LT++GA FGL+LA 
Sbjct: 31 MTSVFLTSGWAFYDYLISPIPHGKLFSWHAVFDAIPNIIQRLPITLGLTLSGATFGLVLA 90 

Query: 61 LIFAVVKINRVKILYPIQALFVSFLRGTPILVQLMLSYYGIPLFLKFLNQKYGFDWNINA 120 
55 LI FA+VKIN+VK+LYPIQA+ FVS FLRGTPILVQLML+ YYGI PLFLKFLNQKYGFDWN+NA 

Sbjct: 91 LIFALVKINKVKLLYPIQAIFVSFLRGTPILVQLMLTYYGIPLFLKFLNQKYGFDWNVNA 150 

Query: 121 IPASVFAITAFAFNEAAYTSETIRAAILSVDQGEIEAARSLGMTSAQVYRRVIIPNAAW 180 
IPAS+FAITAFAFNEAAY SETIRAAILSVD GEIEAA+SLGMTS QVYRRVIIPNA W 
60 Sbjct: 151 IPASIFAITAFAFNEAAYASETIRAAILSVTDTGEIEAAKSLGMTSVQVYRRVIIPNATW 210 



Query: 181 ATPTLINTLIGLTKGTSLAFNAGIVEMFAQAQIMGGSDYRYFERYISVALVYWAVSFLIE 240 
A PTLIN LIGLTKGTSLAFNAGIVEMFAQAQI+GGSDYRYFERYISVALVYW++S L+E 
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Sbjct: 211 AIPTLINGLIGLTKGTSLAENAGIVEMFAQRQILGGSDYRYFERYISVALVYWSISILME 270 

Query: 241 QLGNAIERKM&I KAPRHLTDEI PGGVR 267 

Q+G IE KMAIKAP +E G +R 
Sbjct: 271 QVGRLIENKMAIKAPEQARHEKLGELR 297 

There is also homology to SEQ ID 4794. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 960 

A DNA sequence (GBSxl018) was identified in S.agalactiae <SEQ ID 2927> which encodes the amino 
acid sequence <SEQ ID 2928>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3205 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



K ++ LR++ AMVFQQ++LF 4T ++NV EGL I +KM Q+fl +A 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



+EL KVGL D+ YP LSGGQKQRV +ARALA+ PDVLL DEPT+ALDPELVGEV + 



3 TM++V+H+M F +V+D+V+F+ ++G I+E GTPE++F H ++RT++F 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2929> which encodes the amino acid 
sequence <SEQ ID 2930>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1840 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/247 (80%) , Positives = 229/247 (92%) 
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Query: 1 MIKLRQLTKSFSGQKVLDKLDLDIHKGQWALVGASGAGKSTFLRSM1TYLEEPDYGTIEI SO 

MI +R Ii+K+FSGQKVLD L LDIEKGQV+ALVGASGAGKSTFLRS+NYLE+PD G+I I 
Sbjct: 2 MITIRI^SKTFSGQKVLDSLAIDIEKGQVIALVGASGAGKSTFLRSLNYLEKPDSGSISI 61 

Query: 61 DDFKVDFKSISKDDILTLRRKIAMWQQFNLFERRTALDWKEGLKIVKKMSDQEATRIA 120 

DF VDF++I+ 4- +L LRRKliAJWFQQEHIiFERRTAIi+NVKEGLK+VKK-)-SDQEAT+ +A 
Sbjct: 62 GDFTvnFETITTEQVLILRRKLAMVFQQFNLFERRTALENvra 121 

Query: 121 RDE^KVGLADREKYYPRHLSGGQKQRVALARAIiAMKPDVLLLDEPTSALDPELVGEVEK 180 

+ ELAKVGLADR+ +YPRHLSGGQKQRVA1AEALAMKPDVLLLDEPTSALDPELVGEVEK 
Sbjct: 122 QAEIAKVGIJUDRKHHYPRHLSGGQKQRVALAPAIAMKPDVLLLDEPTSALDPELVGEVEK 181 

Query: 181 SIADAAKQGQTMVLVSHDMNFVYQVADKVLFLEKGRILESGTPEQLFNHPLEERTKEFFA 240 

SI DAAK GQTMVLVSHDMNFVYQVAD+VLFL++G+ILE GTPE++F HP +ERTKEFFA 
Sbjct: 182 SITDAAKSGQTMVLVSHDMNFVYQVADRVLFLDQGKILEQGTPEEVFRHPQKERTKEFFA 241 

Query: 241 SYNKSYL 247 

SY+K+Y+ 
Sbjct: 242 SYSKTYI 248 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 961 

A DNA sequence (GBSxl019) was identified in S.agalactiae <SEQ ID 2931> which encodes the amino 
acid sequence <SEQ ID 2932>. Analysis of this protein sequence reveals the following: 

i N- terminal signal sequence 



30 Final Results 

bacterial cytoplasm — Certainty=0 . 0831 (Affirmative) < succ 

bacterial membrane Certainty=0 . 000G (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 



Query: 


1 


Sbjct: 


7 




61 


Sbjct: 


67 


Query: 


121 


Sbjct: 


127 




181 


Sbjct: 


187 




241 


Sbjct: 


247 


Query: 


301 


Sbjct: 


307 



KM+E +KF E+ YG ++ + + GD+K V ++ Y+A+ VI+ATGA+ LGVPGE+ 



t RGVSYCAVCDGAFF+ -n-L+WGGGDSAVEEAV-t-LT+FA VTI IHRRDQLRAQK+ 



LQ RAF N+KI+F+WD WK+I G + KVS VT+E+ KTGE + GVFIY+G+ P + 



f- G+++T+ M+TS+PG++A GDVR+K LRQI TA G+G++A Q V 4 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 293 3> which encodes the amino acid 
sequence <SEQ ID 2934>. Analysis of this protein sequence reveals the following: 

□ N- terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0 . 03S6 (Affirmative! 

bacterial membrane Certainty=0 . 0000 (Not Clear) . 

bacterial outside --- Certainty=0. 0000 (Not Clear) • 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 23S/300 (78%) , Positives = 273/300 (90%) 





1 


MYDTLI IGSGPGGMTAALYAARSNIiKVGL I EQGAPGGQMNNTAEIENYPGYDHT SGPELS 


60 






MYDTLIIGSGP GMTAALYAARSNL V +IEQGAPGGQMNNT +IENYPGYDHISGPEL+ 




Sbjct: 




MYDTLIIGSGPAGMTAALYAARSNLSVAIIECGAPGGQMNNTFDIENYPGYDHISGPELA 


SO 


Query: 


61 


MKMYEPLEKFEvEHIYGIVQRvEMOGDVKRVITEDESYEAKTOILATGAKNSLLGVPGEE 


120 






MKMYEPLEKF VE+IYGIVQ++EN GD K V+TED SYEAKTVT+ATGAK +LGVPGEE 




Sbjct: 


61 


MKMYEPLEKEWENIYGIVQKIENFGDYKCVLTEDASYEAKTVIIATGAKYRVLGVPGEE 


120 




121 


EYTSRGVSYCAVCDGAFFHDQDLLWGGGDSAVEEAVFLTQFAKSVTIIHRRDQLRAQICV 








YTSRGVSYCAVCDGAFFRDQDLLWGGGDSAVEEA+ +LTQFAK VT+ +HRRDQLRAQK+ 




Sbjct: 


121 


YYTSRGVSYC^VCDGAFFRDQDLLWGGGDSAVEEAIYLTQFAKKVTVVHRRDQLRAQKI 


180 


Query: 


181 


LQDRAFANEKIKF^VWDSWKEIKGIffilKA/SGVTVENLKTGEISEMTFGGVFIYVGLKPHS 


240 






LQDRAFAN+K+ F+WDSWKEI+GN+IKVS V +EN+KTG++++ FGGVFIYVG+ P + 




Sbjct: 


181 


I£DRAFANDKVDFITOSVVKEIQGiroiKVSlWLIEN^ 


240 




241 


SMVSELGITDETGfJVlTDlWKTSIPGLYAIGDTOQKDLRQIATAVGEGAIAGQGVYNYl 


300 






MV +Jj ITD GW++TD +M+TSIPG++AIGDVRQKDLRQI TAVG+GAIAGQGVY+Y+ 




Sb j ct : 


241 


GMVKDLEITDSEGWIITDDHMRTSIPGIFAIGDVRQKDLRQITTAVGDGAIAGQGVYHYL 


300 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 



Example 962 

A DNA sequence (GBSxl020) was identified in S.agalactiae <SEQ ID 2935> which encodes the amino 
acid sequence <SEQ ID 2936>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3626 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15163 GB:Z99120 similar to nicotinate 

phosphoribosyltransferase [Bacillus subtilis] 
50 Identities = 309/476 (64%) , Positives = 384/476 (79%) , Gaps = 2/476 (0%) 



Query: 2 YKDDSLTLHTDLYQINMMQVYFNKGIHNKRAVFEAYFRKVPFENGYAVFAGLERIVRYLE 61 

+KDDSL+LHTDLYQINM + Y+ GIH K+A+FE +FR++PFENGYAVFAGLE+ + YLE 

Sbjct: 6 FKDDSLSLHTDLYQINMAETYWRDGIHEKKAIFELFFRRLPFENGYAVFAGLEKAIEYLE 65 

Query: 62 NLSFSDSDLSYLE-ELGYPEEFIiDYLKNLroffiLTVKSAKEGDLVFANEPLVQIEGPLAQC 120 

N F+DSDI1SYI1+ ELGY E+F++YL+ L ++ S KEG+LVF NEP++++E PL + 

Sbjct: 66 NFKFTDSDLSYLQDELGYHFJDFIEYLRGLSFTGSLYSMKEG3LVFNNEPIMRVEAPLVEA 125 
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121 


Sbjct: 


126 


Query: 


181 


Sbjct: 


186 


Query: 


241 


Sb j ct : 


246 


Query: 


301 


Sbjct: 


306 


Query: 


361 


Sbjct: 


365 




421 


Sbj ct: 


425 



QL+ETA+LNI +NYQTL+ATKAARI + VI DE LEFGTRRA EMDAR+WG RAA+IGG + 



ATSNVRAGK FNIPVSGTHAHALVQ Y D+Y AFK YAETHKDCVFLVDTYDTLR G+PN 



AIRVAKE G++INF+G+RLDSGDLAYLSKK R+ LD+AGF +AK+ AS+DLDE+TI+NLK 



Q A+IDVWGVGTKLITAYDQPALGAVYK+V+IE D G M DTIK+S+N EKV+TPG+K+ 



- MFHP +T+I+K V +F A L IF+KG 



LW+EYKR+ P++YPVDL+ r 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2937> which encodes the amino acid 
sequence <SEQ ID 2938>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3192 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 409/484 (84%) , Positives = 446/484 (91%) 

Query: 1 m-KDDSLTLHTDLYQINMMQVYFNKGIHNKRAVFEAYFRKVPFENGYAVFAGLERIVRYL 60 

MYKDDSLTLHTDLYQINMMQVYF +GIHN+ AVFE YFRK PF NGYAVFAGL+R+V YL 
Sbjct: 1 I^KDDSLTLHTDLYQINMMQVYFEQGIHNRHAVFEVYFRKEPFNNGYAVFAGLQRMVEYL 60 

Query: 61 ENLSFSDSDLSYLEELGYPEEFLDYLKNLKMELTVKSAKEGDLVFANEPLVQIEGPLAQC 120 

E FS++DL+YLEELGYPE FL YLK L++ELT++SAKEGDDVFANEP+VQ+EGPL QC 
Sbjct: 61 EQFQFSETDLAYLEELGYPENFLTYLKELRLELTIRSAKEGDLVFANEPIVQVEGPLGQC 120 

Query: 121 QLVETAIIiNI INYQTLVATKAARIRSVIEDEPLLEFGTRRAQEMDAAIWGTRAAI IGGAN 180 

QLVETA+LNI +N+QTL+ATKAARIRSVIEDEPLLEFGTRRAQE+DAAIWGTRAA+ IGGA+ 
Sbjct: 121 QLVETALLNIVNFQTLIATKAARIRSVIEEIEPLLEFGTRRAQELDAAIWGTRAAMIGGAD 180 

Query: 181 ATSNVRAGKIFNIPVSGTHAHALVQTYGDDYQAFKAYAETHKDCVFLVDTYDTLRVGVPN 240 

ATSNVRAGK F+IPVSGTHAHALVQ YG+DY AF AYA+THKDCVFLVDTYDTL+VGVP 
Sbjct: 181 ATSNVFAGKRFDIPVSGTHAHALVQAYG1\DYDAFMAYAKTHKDCVFLVDTYDTLKVGVPT 240 

Query: 241 AIRVAKEMGEKINFLGVRLDSGDIAYLSKKVRQQLDDAGFPNAKIYASNDLDENTIIJSLK 300 

AIRVAKEMG+KINFLGVRLDSGDLAYLSK VRQQLDDAGF AKIYASNDLDENTILNLK 
Sbjct: 241 AIRVAKEMGDKINFLGTOLDSGDLAYLSKTTOQQLDDAGFTEaKIYASlTOLDENTILNLK 300 

Query: 301 MQKAKIDWGVGTKIjITAYDQPALGAVYKIVSIETDAGSMRDTIKLSNNAEKVSTPGKKQ 360 

MQKAKIDVWGVGTKLITAYDQPALGAVYKIVSIE + GSMRDTIKLSNMAEKVSTPGKKQ 
Sbjct: 301 MQKAKIDWGVGTKLITAYDQPALGAVYKIVSIEQEDGSMRDTIKLSNNAEKVSTPGKKQ 360 

Query: 361 VWRITSRAKGKSEGDYITFADTDVTQLDEIFjyiFHPrYTYINKTVRDFDAVPLLVDIFDKG 420 

VWRITSR KGKSEGDYITF D +V +L EIEMFHPTYTYI KTV++FDA+PLLVDIF KG 
Sbjct: 361 VWRITSP^KGKSEGDYITFTDIW'NELTEIEMFHPTYTYIKKTVKEFDAIPLLVDIFVKG 420 
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Query: 421 KLWQLPSLQEIQEYGRKEFDQLWDEYKSVLjNPQDYPVDLARDVWQNKMDLIDRIRKEAL 480 

+LVYQLP+L EI+ Y +KEFD+LWDEYKRVnNPQDYPVDLRRDVWQNKM LID IRK+A 
Sbjct: 421 ELVYQLPTIAEIKAYAKKEFDKnW)EYKRvljM?QDYPVDI^^ 480 

Query: 481 AKGE 484 
K E 

Sbjct: 481 GKSE 484 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 963 

A DNA sequence (GBSxl021) was identified in S.agalactiae <SEQ ID 2939> which encodes the amino 
acid sequence <SEQ ID 2940>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2744 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) <; suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74810 GB:AE000269 NAD synthetase, prefers NH3 over glutamine 
[Escherichia coli K12] 
Identities = 173/274 (63%), Positives = 214/274 (77%), Gaps = 1/274 (0%) 

Query: 1 MTLQDQIIKELGVKPVINPSQEIRRSVEFLKDYLLKHSFLKTYVLGISGGQDSTLAGRLA SO 

MTLQ QIIK LG KP IN +EIRRSV+FLK YL + F+K+ VLGISGGQDSTLAG+L 
Sbjct: 1 MTLQQQIIKALGAKPQINAEEEIRRSVDFLKSYLQTYPFIKSLVLGISGGQDSTLAGKLC 60 

Query: 61 QIAVEELRADTG-ENYQFIAIRLPYGIQADEEDAQKALDFIKPDIALTINIKEAVDGQVR 119 

Q+A+ ELR +TG E+ QFIA+RLPYG+QADE+D Q A+ FI+PD LT+NIK AV + 
Sbjct: 61 QMAINELRLETGNESLQFIAVRL?YGVQADEQDCQDAIAFIQPDR\rLTVNIKGAVLASEQ 120 

Query: 120 ADNAAGVEITDFNKGNIKARQRMrSQYAVAGQYAGAVIGTDHAAENITGFFTKFGDGGAD 179 

AL AG+E++DF +GN KAR+RM +QY4-+AG +G V+GTDHAAE ITGFFTK+GDGG D 
Sbjct: 121 ALREAGIELSDFVRGNEKARERMKAQYSIAGMTSGVWGTDHAAEAITGFFTKYGDGGTD 180 

Query: 180 LDPLFRLNKSQGKQLLAELGADKALYEKIPTADLEENKPGIADEIALGVTYQEIDAYLEG 23 9 

+ PL+RLNK QGKQLLA L + LY+K PTADLE+++P + DE+ALGVTY ID YLEG 
Sbjct: 181 INPLYRLNIJRQGKQLLAALACPEHLYKKAPTADLEDDRPSLPDEVALGVTYDNIDDYLEG 240 

Query: 240 KWSDKSRGIIENWWYKGQHKRHLPITIFDDFWK 273 

K V + IENW+ K +HKR PIT+FDDFWK 
Sbjct: 241 KNVPQQVARTIENWYLKTEHKRRPPITVFDDFWK 274 

A related DNA sequence was identified in S.pyogenes <SEQ ID 294 1> which encodes the amino acid 
sequence <SEQ ID 2942>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3482 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 213/274 (77%), Positives = 242/274 (87%), Gaps = 1/274 (0%) 
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Query: 


1 


Sbjct: 


15 


Query: 


61 


Sb j Ct : 


75 


Query: 


120 


Sbjct: 


13 5 


Query: 


180 


Sbjct: 


195 


Query: 


240 


Sbjct: 


255 



MTLQDQIIKELGVKPVINPSQEII^SVEFLKDYLLKHSFLKTYVLGISGGQDSTLaGRLA 60 
MTLQ++II++LGVK I+P 4-EIR++V+FLK YL KHSFLKTYVLGISGGQDSTLAG+LA 
MTLQEEIIRQLGVKASIDPQEEIRKA\TDFl.KAYLRKHSFLKTYVLGISGGQDSTIiAGKliA 74 



AL AAGVEI+DFNKGNIKARQRMISQYA+AGQ AGAVIGTDHAAENITGFFTKFGDGGAD 



+LPLFRIaNK QGK LL LGAD ALYEK+PTADLE+ KPG+ADE+ALGVTYQ+ID YLEG 
ILPLFRMKRQGKALLKVLGADAALYEKOTTADLEDQKPGLADEUALGVTYQDIDDYLEG 2 54 

KWSDKSRGIIENWWYKGQHKRHLPITIFDDFWK 273 
K++S ++ IE WW+KGQHKRHLPITIFDDFWK 
KLISKVAQATIEKWWHKGQHKRHLPITIFDDFWK 288 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 964 

A DNA sequence (GBSxl022) was identified in S.agalactiae <SEQ ID 2943> which encodes the amino 
acid sequence <SEQ ID 2944>. Analysis of this protein sequence reveals the following: 

3 N~ terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2718 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^-0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA82960 GB:Z30315 aminopeptidase C [Streptococcus thermophilus] 
Identities = 363/444 (81%) , Positives = 407/444 (90%) 

Query: 1 MSKLTQTFTDKLFADYQANTKFSAIENAVTHNGLLKSLETRQSEIENDYVFSIDLTKDEV 60 

M+ L+ FT+KLFADY+AN K+ AIENAVTHNGLLKS+ETRQSE+END+VFSIDLTKDEV 
Sbjct: 1 MTSLSTDFTEKLFADYFANAKYGAIENAVTHNGLLKSIETRQSEVENDFVFSIDLTKDEV 60 

Query: 61 SNQKQSGRCWMFAAIOTFRHKLISDFKLENFELSQAHTFFTOKYEKSNWFMEQIIATANQ 120 

SNQK SGRCWMFAAUNTFRHKLISDFKIjE+FELSQAHTFFVroKYEKSNWF+EQIIATA+Q 
Sbjct: 61 SNQKASGRCWMFAAIiNTFRHKLl SDFKLESFELSQAHTFFWDKYEKSNWFLEQI IATATJQ 120 

Query: 121 ELSSRKVKFLLDVPQQDGGQWDIWVALFEKYGWPKTVYPESVSSSASRELNQYLNKLLR 180 

E+ SRKVKFLLD PQQDGGQWDMW+LFEKYGWPK+VYPESV+SS SRELNQYCNKLLR 
Sbjct: 121 EIGSRKVKFLLDTPQQDGGQWDMV\rSLFEKYGWPKSVYPESVASSNSRELNQYLNKLLR 180 

Query: 181 QDAQII^ELIAQGADGAWQNKKEELLQEIFNFLAMNLGLPPQSFDFAYRDKDNHYQSDK 240 

QDAQILR+LIA GAD A VQ KKEE LQEIFH+LAM LGLPP+ FDFAYRDKD++Y+S+K 
Sbjct: 181 QDAQILRDLIASGADQAAVQAKKEEFLQEIFNYLAMTLGLPPRQFDFAYRDKDDNYRSEK 240 

Query: 241 NITPECAFYQKYVNLDLSDYVSIINAPTVDKPYGQSYTVEMLGNWGGPAVKYL^ 300 

ITP+AF++KYV L LSDYVS+INAPT DKPYG+SYTVEMLGNWG P+V+Y+NL M RF 
Sbjct: 241 GITPRAFFEKyVGLKLSDYVSVINAPTADKPYC-KSYTVEMLGNWGAPSVRYINLPMDRF 300 



Query: 361 MTHA^m J TGvDLDESGQPLKWKvENSWGEKVGK^X3YFVASDAWMDEYTYQIvWKELLTK 420 
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MTHAMVLTGVDLD G+P+KWK+ENSWG+KVG+ GYFVASDAWMDEYTYQIWRK+ LT 
Sbjct: 361 MTHAMVLTGVDLDADGKPIKWKIENSWGDKVGQKGYFVRSDAWMDEYTYQIWRKDFLTA 420 

Query: 421 EELEAYNAEP I TLAPWDPMGALAN 444 

EEL AY A+P LAPWDPMG+LA+ 
Sbjct: 421 EELAAYEADPQVLAPWDPMGSLAS 444 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2945> which encodes the amino acid 
sequence <SEQ ID 2946>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3002 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 369/443 (83%) , Positives = 407/443 (91%) 

Query: 1 MSKLTQTFTDKLFADYQANTKFSAIENAVTHNGLLKSLETRQSEIENDYVFSIDLTKDEV 60 

MS LT+TFT++LFA Y+AN KFSAIENAVTHNGLLKSLETRQSE++ND+VFSIDI1TKD+V 
Sbjct: 1 MSALTETFTEQLFAHYK^NAKFSAIENAVTHNGLLKSLETRQSEVDNDFVFSIDLTKDKV 60 

Query: 61 SNQKQSGRCM^FAALOTFRHKLISDFKLENFELSQAHTFFWDKYEKSNWFMEQIIATANQ 120 

SNQK SGRCWMFAALOTFRHKLI++FKLENFELSQAHTFFWDKYEK+NWFMEQ+rATA+Q 
Sbjct: 61 SNQKASGRCWMFAALNTFRHKLITEFKLENFELSC^iHTFFWDKYEKANWFMEQVIArADQ 120 

Query: 121 EIJSSRKVKFLLDVPQQDGGQWD^mALFEK^GWPKTvYPESVSSSASREI^QYrJNKLLR 180 

EL+SRKVKFLLDVPQQDGGQWDMW+LFEKYGWPK+VYPES+SSS SRELNQYLNKLLR 
Sbjct: 121 ELTSRKVKFLLDVPQQDGGQWDMWSLFEKYGvVPKSVYPESISSSNSRELNQYLNECLLR 180 

Query: 181 QDAQII^LIAC£3ffiGATVQISIKKEEBL^^ 240 

QDAQILR+LIA GA V+++K ELLQEIFNFLAM LGLPP+ FDFAYRDKD+HY +K 
Sbjct: 181 QDAQILFJDLIASGAKADQVEDRKAELLQEIFNFLAMTLGLPPRHFDFAYRDKDDHYHVEK 240 

Query: 241 NITPKAFYQKYVNLDLSDYVSIINAPTVDKPYGQSYTVEMLGNWGGPAVKYLNLDMKRF 300 

+TP+AFY K+V L LSDYVS+INAPT DKPYG+SYTVEMLGNWG V+YLNLDMKRF 
Sbjct: 241 GLTPQAFYDKFVGLKLSDYVSVINAPTADKPYGKSYTVEMLGNWGSREVRYIiNLDMKRF 300 

Query: 301 KEIAIAQMKSGETVWFGSDVGQVSNRQKGILATTTYDFNSSMDIKLSQDKAGRLDYSESL 360 

KELAI QM++GE+VWFGSDVGQVS+RQKGILAT TYDF +SMDI LSQDKAGRLDYSESL 
Sbjct: 301 KELAIKQMQAGESVWFGSDVGQVSDRQKGILATWTYDFEASMDINLSQDKAGRLDYSESL 360 

Query: 361 MTHAMVLTGVDLDESGQPLKWKYENSWGEICVGICDGYFVASDAKMDEYTYQIVVRKELLTK 420 

MTHAMVLTGVDLDE+G+PLKWKVENSWGEKVG GYFVASDAWMDEYTYQIWRKE LT 
Sbjct: 361 MTHAMVLTGVDLDETGKPLKWKVENSWG3KVGDKGYFVASDAWMDEYTYQIVVRKEFLTA 420 

Query: 421 EELEAYNAEPITLAPWDPMGALA 443 

+EL AY EP LAPWDPMGALA 
Sbjct: 421 DELAAYEKEPQVLAPWDPMGALA 443 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 965 

A DNA sequence (GBSxl024) was identified in S.agalactiae <SEQ ID 2947> which encodes the amino 
acid sequence <SEQ ID 2948>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9533> which encodes amino acid sequence <SEQ ID 9534> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF17262 GB:AF210752 penicillin-binding protein 1A 
[Streptococcus pneumoniae] 
Identities = 412/725 (56%) , Positives = 544/725 (74%) , Gaps = 14/725 (1%) 

Query: 4 IKKESVIKLLKYAFGIIMGFIILAIVIGGLLFAYYVSRSPKLTDQALKSVNSSLVYDGNN 63 

+ K ++++L+KY + +1 AIV+GG +F YYVS++P L++ L + SS +YD N 

Sbjct: 1 MNKPTILRLIKyLSISFLSLVIAAIVLGGGVFFYYVSKAPSLSESKLVATTSSKIYDNKII 60 

Query: 64 KLIADLGSEKRESVSADSIPLNLWAITSIEDKRFFIOIRGVTIIYRILGAAVJHNLVSSNTQ 123 

+LIADLGSE+R + A+ IP +LV AI SIED RFF HRG+D RILGA ML S++ Q 
Sbjct: 61 QLIADLGSERRVNAQANDIPTDLVKAIVSIEDHRFFDHRGIDTIRILGAFLRNLQSNSLQ 120 

Query: 124 GGSTLDQQLIKLAYFSTNKSDQTLKRKSQETOIALQMERKYTKEEILTFYINKVYMGNGN 183 

GGSTL QQLIKL YFST+ SDQT+ RK+QE WLA+Q+E+K TK+EILT+YINKVYM NGN 
Sbjct: 121 GGSTLTQQLIKLTYFSTSTSDQTISRKAQFAWLAIQLEQKATKQEILTYYINKVYMSNGN 180 

Query: 184 YGMRTTAKSYFGKDLKELSIAQIJUjI^GIPQAPTQYDPYKNPESAQTRRNTVLQQMYQDK 243 

YGM+T A++Y+GKDL LS+ QIALLAG+PQAP QYDPY +PE+AQ RRN VL +M 
Sbjct: 181 YGMQTAAQNYYGKDLNNLSLPQLALIiAGMPQAPNQYDPYSHPFAAQDRRNLVLSEMKNQG 240 

Query: 244 NI S KKEYDQAVATP VTDGLKELKQKSTYPKYMDNYLKQVI SEVKQKTGKD I FTAGLKVYT 303 

IS ++Y++AV TP+TDGL+ LK S YP YMDNYLK+VI++V+++TG ++ T G+ VYT 
Sbjct: 241 YISAEQYEKAVNTPITDGLQSLKSASNYPAYMDNYIiKEVINQVEEETGYNLLTTGMDVYT 300 

Query: 3 04 NINTDAQKQLYDIYNSDTYIAYPNNELQIASTIMDATNG1WIAQLGGRHQNENISFGTNQ 363 

N++ +AQK L+DIYN+D Y+AYP++ELQ+ASTI+D +NGKVTAQLG RHQ+ N+SFG NQ 
Sbjct: 301 NVDQEaQKHLWDIYNTDEYVAYPDDELQVASTIVDVSNGKV-IAQLGARHQSSNVSFGINQ 360 

Query: 364 SVLTDRDWGSTMKPISAYAPAIDSGVYNSTGQSLNDSVYYWPGTSTQLYDWDRQYMGWMS 423 

+V T+RDWGSTMKP I + YAPA++ GVY+ST ++D Y +PGT T +Y+WDR Y G ++ 
Sbjct: 361 AVETNRDWGSTMKPITDYAPALEYGVYDSTATIvHDEPYNYPGTDTPVYNWDRGYFGNIT 420 

Query: 424 MQTAIQQSRNVPAVRALEAAGLDEAKSFLEKLGIYYPEMNYSNAISSNNSSSDAKYGASS 483 

+Q A+QQSRNVPAV L GL+ AK+FL LGI YP ++YSNAISSN + SD KYGASS 
Sbjct: 421 LQYALQQSRNVPAvETLNKVGUTOAKTFLNGLGIDYPSLHYSNAISSNTTESDKKYGASS 480 

Query: 484 EKMAAAYSAFANGGTYYKPQYVNKIEFSDGTNDTYAASGSRAMKETTAYMMTDMLKTVIjT 543 

EKMAAAY+AFANGGTYYKP Y++K+ FSDG+ ++ G+RAMKETTAYMMTDM+ KTVL 
Sbjct: 481 EKMAAAYAAFANGGTYYKPmiHKOTFSDGSEKEFSm?GTRAMKETTAYMMTDMKTVLV 540 

Query: 544 FGTGTKAAIPGVAQAGKTGTSNYTEDELAKIEATTGIYNSAVGTMAPDENFVGYTSKYTM 603 

+G G A +P + QAGKTGTSNYT++E+ K Y G +APDE FVGYT KY M 

Sbjct: 541 YGIGRGAYLPWLPQAGKTGTSNYTDEEIEK YIKNTGYVAPDEMFVGYTRKYAM 593 

Query: 604 AIWTGYKNRLTPLYGSQLDIATEVYRAMMSYLTGGYSA-DWTMPEGLYRSGSYLYINGTT 652 

A+WTGY NRLTPL G L +A +VYR+MM+YL+ G + DW +PEGLYR+G +++ NG 
Sbjct: 594 AVWTGYSNRLTPLVGDGLTVAAKVYRSMMTYLSEGSNPEDWNIPEGLYRNGEFVFKNGAR 653 

Query: 663 TTGTYSSSVYKNIYQNSGQSSQSSSSTSSEKQICEDKNTANDANSSSPQVETPNNGNATTP 722 

+T +SS + S +SS SSS +S+ + + N++ +++P T + TTP 

Sbjct: 654 ST--WSSPAPQQ--PPSTESSSSSSDSSTSQSNSTTPSTNNSTTTNPNNNTQQSN--TTP 707 

Query: 723 NNSNQ 727 

+ NQ 
Sbjct: 708 DQQNQ 712 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2949> which encodes the amino acid 
sequence <SEQ ID 2950>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.96 Transmembrane 19 - 35 ( 9 - 43) 

Final Results 

bacterial membrane Certainty=0 . 6583 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA88918 GB:Z49095 penicillin-binding protein la [Streptococcus pneumoniae] 
Identities = 422/712 (59%) , Positives = 536/712 (75%) , Gaps = 8/712 (1%) 

IKNPKILKWLKYVLSAILSLIILVIIIGGLLFTFYISSAPKLSEAQLKSTNSSLWDGNN 63 
+ P IL+ +KY+ + LSL+I I++GG +F +Y+S AP LSE++L +T SS +YD N 
MKKPTILRLIICYLSISFLSLVIAAIVLGGGVFFYYVSKAPSLSESKLVATTSSKIYDNKN 60 

NLIADLGSEKRENVTADSIPINLVNAITSIEDKRFPNHRGvDLYRIFGAAFHNLTSQTTQ 123 
LIADLGSE+R N A+ IP +LV AI SIED RFF+HRG+D RI GA NL S + Q 



GGSTL QQLIKL YFST+ SDQT+ RKAQE WLA+Q+E+K TKQEILT+YINKVYM NGN 



Query: 


4 


Sbjct: 


1 


Query: 


64 


Sbjct: 


61 


Query: 


124 


Sbjct: 


121 


Query: 


184 


Sbjct: 


181 


Query: 


244 


Sb j ct : 


241 


Query: 


304 


Sb j Ct : 


301 




364 


Sb j ct : 






424 


Sbjct: 


421 




484 


Sb j ct : 


481 




544 


Sbjct: 


541 




604 


Sbjct: 


594 




663 


Sbjct: 


654 



YGM TAA++YYGKDL +LS QLALLAG+PQAF+QYDPY HP3AAQ+KRN+VL +M 



S A+ TP+ +GLQSL+ S YP YMDNYLK+VI +V++ET 



IQ A+ SRNV AV L GLD A++FL+ LGI+YP MHY+NAISSN + S+KKYGASS 



EKMAAAYAAFANGGIYHKP Y+NK+ FSDG+ K F + G RAMKETTAYMMT+M+KTVLT 



YGTG A +P + QAGKTGTSNYTDEE+ K 



AVWTGY NRLTP+ G +A VYRSM+TYL+ + DWTMP+GLYR4G F++ +G 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 521/729 (71%) , Positives = 621/729 (84%) , Gaps = 10/729 (1%) 



MITIKKESVIKLLKYAFGIIMGFIILAIVIGGLLFAYYVSRSPKLTDQALKSVNSSLVYD 60 
+ ITIK ++K LKY 1+ IIL I+IGGLLF +Y+S +PKL++ LKS NSSLVYD 
VITIKNEKILKWLKYVLSAILSLIILVIIIGGLLFTFyiSSAPKLSEAQLKSTNSSLVYD 60 

GNNKLIADLGSEKRESVSADSIPLNLvmiTSIEDKRFFKHRGVDIYRILGAAWHNLVSS 120 
GNN LIADLGSEKRE+V+ADSIP+NLVNAITSIEDKRFF HRGVD+YRI GAA+HNL S 
GNHNLIADLGSEKRENVTADSIPINLVNAIT3IEDKRFFNHRGVDLYRIFGAAFHNLTSQ 120 

1 OTQGGSTLDQQLIKIAYFSTNKSDQTIjKRKSQEVKLALQMERiCYTKEEILTFYINfCVYMG 180 

TC^STLDQQLIKIAYFSTN+SDQTLKRK+QEvWLALQMERKYTK+EILTFYINKVYMG 
1 TTQGGSTLDQQLIKIAYFSTKESDQTLKRKAQHVPnjALQMERKYTKQEILTFYINKVYMG 180 

1 NGNYGMRTTAKSYFGKDLKELS IAQLALLAGI PQAPTQYDPYKNPESAQTRRNTVLQQMY 240 

NGNYGM T AKSY+GKDLK+LS AQLALLAGIPQAP+QYDPY +PE+AQ RRN VLQQMY 
1 NGNYGMLTAAKSYYGKDLKDLSYAQU^IAGIPQAPSQYDPYLHPFJ^QNRRNVVLQQMY 240 

1 QDKNISKKEYDQAVATPVTDGLKELKQKSTYPKYKDNYLKQVISEVKQKTGKDIFTAGLK 300 

+K+++K EY+ A+ATPV +GL+ L+Q+STYPKYMDNYLKQVI EVK++T KDIFTAGLK 
1 MEKHLTKAEYETAIATPVAEGLQSLQQRSTYPKYMDMYLKQVIEEVKKETNKDIFTAGLK 300 



VYTNI DAQ+ LY+IY+S Y+ YP+ + Q+AST1+D TNG VIAQLGGR+Q+EN+SFG 



1 ASSEKMAAAYSAFANGGTYYKPQYVNKIEFSDGTNDTyAASGSRAMKETTAYMMTDMLKT 5 

ASSEKMAAAY+AFANGG Y+KP+YVNK+EFSDGT+ T+ G RAMKETTAYMMTDMLKT 
1 ASSEKMAAAYAAFANGGIYHKPRYVNKVEFSDGTSKTFDEKGKRAMKETTAYMMTDMLKT 5 



I TTTTGT-YSSSVYKNIYQNSGQSSQSSSSTSSEKQKEDKNTANDAWSSSPQVETPNNGNA 719 

T + T Y++SVY N+Y N ++++ SS+ +D +++ND ++S+ T NNG+ 

L TYASNTDYTNSVYNNLYSN NTTTASSQTTSDDTSSSNDTSNST- - -NTDNNGSH 711 

: 720 TTPNNSNQT 728 



55 A related GBS gene <SEQ ID 8695> and protein <SEQ ID 8696> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crenel: 10 
McG: Discrim Score: 6.55 
GvH: Signal Score (-7.5): -1.98 
60 Possible site: 36 

»> Seems to have a cleavable K-term signal seq. 
ALOM program count: 0 value: 4.03 threshold: 0.0 
PERIPHERAL Likelihood = 4.03 201 
modified ALOM score: -1.31 

65 

*** Reasoning Step: 3 
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Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suoo 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

57.5/76.2% over 712aa 

Streptococcus 

pneumoniae 

GP 1 6563351 1 penicillin-binding protein 1A Insert characterized 
ORF00399(310 - 2484 of 2850) 

GP| 6563351 |gb|AAF17262.l|AF210752_l|AF210752(l - 713 of 719) penicillin-binding protein 1A 
{Streptococcus pneumoniae} 
%Match =43.8 

%Identity =57.5 %Similarity =76.2 

Matches = 412 Mismatches = 166 Conservative Sub.s = 134 

237 267 297 327 357 387 417 447 

LI ISEKMDFS *RRVPFT,KSLT* ILLKKNY*AVITI KKESVIKLLKYAFGI IMGFIIIAIVIGGLLFAYYVSRSPKLTDQA 
= | ::::|=|| : ::| |||,|| :( ||||::| |=: 

MNKPTILRLIKYLSISFLSLVIAAIVLGGGVFFYYVSKAPSLSESK 



LKSWSSLVYDGOTCKLIADLGSEKRESVSADSIPI^LVNAITSIEDKRFFraRGVDIYRILGAAWHNLVSSNTQGGSTLD 

I : II = 1 1 1=11111111=1 = 1 = II = 11 II Mil III MM Mill II l = = llllll 

LVATTSSKITDNKNQLIADLGSERRVNAQANDIPTDLVKAIVSIEDHRFFDHRGIDTIRILGAFLRNLQSNSLQGGSTLT 



717 747 777 807 837 867 897 927 

QQLIKLAYFSTNKSDQTLKRKSQEWLAI£MERKYT^ 

llllll 1111= HIM MM IIIMmi MUMIIIMI 111111=1 1 — 1 = 1111 Mill 

35 QQDIKLTYFSM^STSDQTISKKAQEAWI^IQLEQKATKQEILTYYINIOTMSNGNyGMQTAAQNYYGKDIJMLSLPQIALL 
140 150 160 170 180 190 200 

957 987 1017 1047 1077 1107 1137 1167 

AGIPQAPTQYDPYKNPESAQTRRNTVLQQMYQDKNISKKETY^ 
40 ||:|||| Mil :||:|| ||| || :| || :: | :: || || : |||| : || | || | | | | | | | : | | : :| : : = 

AGMPCAPNQYDPYSHPEMQDRRNLvLSEMKNQGYISAEQYEKAVNTPITDGLQSLKSASNYPAYMDNYLKEVINQVEEE 
220 230 240 250 260 270 280 

1197 1227 1257 1287 1317 1347 1377 1407 

45 TGKDIFTAGLKWTNINTDAQKQLYDIYNSDTYIAYPNNELQIASTIMDAraGKVIAQLGGRHQNENISFGTNQSVLTDR 
II = = = l 1= 1111 = = =111 = 1=1111 = 1 I = I I I = = I I M I II = I =111111111 111= 1 = 111 11 = 1 1 = 1 
TGYNLLTTGMDVYTlStVDQEAQKHLWDIYNTDEYVAYPDDELQVASTIVDVSNGKVIAQLGARHQSSNVSFGINQAWTNR 
300 310 320 330 340 350 360 

50 1437 1467 1497 1527 1557 1587 1617 1647 

DWGSTMKPISAYAPAIDSGVYNSTGQSLNDSVYYWPGTSTQLYDVroRQ 

111111111= 1111 = = 111 = 11 -I I =111 I =1 = 111 I I = = = l 1 = 111111111 I 11= II 
DWGSTMKPITDYAPALEYGVYDSTATIVHDEPYNYPGTDTPVYOT^ 

380 390 400 410 420 430 440 

55 

1677 1707 1737 1767 1797 1827 1857 1887 

SFLEK1GIYYPEMNYSNAISSNNSSSDAKYGASSEKMAAAYSAFANGGTYYK 

= 11 III II = = 11111111 = II llllllllllll|:|||lll!llll Ml: 1111= == 1 = 111111 
TFLNGLGIDYPSLHYSNAISSNTTESDKKYGASSEKMAAAYAAFANGGTYYKPNIYIHKWFSDGSEKEFSNVGTRAMKE^ 
60 460 470 480 490 500 510 520 

1917 1947 1977 2007 2037 2067 2097 2127 

TAYMMTDMLKTVLTFGTGTKAAIPGVAQAGKTGTSNYTEDEIAKIEATTGIYNSAVGTMAPDENFVGYTSKYTMRIWTGY 

11111111 = 1111 =11 MM IIIIMIMIMM I I I mil Mill II IMIM 

65 TAYMMTDMMKTVLVYGIGRGAYLPWLPQAGKTGTSNYTDEEIEK YIKNTGYVAPDEMFVGYTRKYAMAVWTGY 

540 550 560 570 580 590 
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2157 2187 2214 2244 2274 2304 2334 2364 

KWLTPLYGSQLDIATEVYRflMMSYLT-GGYSADKTMPEGLYRSC-SYLYINGTTTTGTYSSSVYKNIYQNSGQSSQSSSS 

I I I I I I I I :| :|l|:|l = ll= I II =111111 = 1 = = = II =1 =11 = I =11 III 

SKRLTPLVGDGLTVAAI<VYRSMMTYLSEGSNPEDhT\ I IPEGLYRNGEF\'FKI'IGARST--WSSPAPQQ--PPSTESSSSSSD 
610 620 630 640 650 660 670 

2394 2424 2454 2484 2514 2544 2574 2604 

TSSEKQKEDKOTMJD&NSSSPQVETPl^^ 
:|: : : |:: :::| ] : |||: || 



SEQ ID 8696 (GBS146) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 4; MW 82kDa), in Figure 168 (lane 11-13; MW 96.5kDa) and in Figure 
238 (lane 8; MW 96.5kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis 
of total cell extract is shown in Figure 49 (lane 2; MW 107kDa). 

Purified Thio-GBS146-His is shown in Figure 244, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 966 

A DNA sequence (GBSxl025) was identified in S.agalactiae <SEQ ID 295 1> which encodes the amino 
acid sequence <SEQ ID 2952>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.3S47 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26957 GB:M90528 ORF [Streptococcus oralis] 
Identities = 143/196 (72%), Positives = 165/196 (B3%) , Gaps = 1/196 (0%) 

Query: 1 ^WNYPHQLIRKTT 1 /TKSKKKKIDPANRGMSFEAAINATNDYYLSHELAVIHKKPTPVQIV 60 

MVNYPH++ + + K +FANRGMSFE INATNDYYLSH LAVIHKKPTP+QIV 

Sbjct: 1 MVOTPHKISSQKRQAPPSQTK-NFANRGMSFEKJIINA'INDYYLSHGLAVIHKKPTPIQIV 59 

Query: 61 KVDYPKRSRAKIVEAYFRQASTTDYSGVYKGYYIDFEAKETRQKTAMPMKNFHAHQIEHM 120 

+VDYP+RSRAKIVEAYFRQASTTDYSGVY GYYIDFEAKETRQK A+PMKNFH HQI+HM 
Sbjct: 60 RVDYPQRSRAKIVEAYFRQASTTDYSGVYDGYYIDFEAKETRQKHAIPMKNFHHHQIQHM 119 

Query: 121 ANV1QQKGICFVLLHFSTLKETYLLPANELISFYQIDKGNKSMPIDYIRKNGFFVKESAF 180 

VL Q+GI CFVLLHF++ +ETYLLPA +L1 FY DKG KSMP+ YIR+NG+ ++ AF 
Sbjct: 120 EQVLAQRGICFVLLHFASQQETYLLPAVDLIRFYHQDKGQKSMPLGYIRENGYRIELGAF 179 

Query: 181 PQVPYLDIIEEKLLGG 196 

PQ+PYLDII+E LLGG 
Sbjct: 180 PQIPYLDI IKEHLLGG 195 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2953> which encodes the amino acid 
sequence <SEQ ID 2954>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0 . 5030 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 166/199 (83%) , Positives = 177/199 (88%) 

Query: 1 MVNYPHQLIRKTTVTKSKKKKIDFAmGMS?EAA.IKATMDYYLSHEIAVIHKKPTPVQIV 60 

MVNYPH LIR+ + K+ K+DFANRGMSFEAAINATNDYYLS ++AVIHKKPTPVQIV 
Sbjct: 1 ITOTrPHNLIRQKVSSVQKQNKVDFANRGMSFSAA.INATNDYYLSRQIAVIHKKPTPVQIV 60 

Query: 61 KVDYPKRSRAKIVEAYFRQASTTDYSGVYKGYYIDFEAKETRQKTAMPMKNFHAHQIEHM 120 

KVDYPKRSRAKIVEAYFRCASTTDY GVYKG+Y+DFEAKETRQKTAMPMKNFH HQIEHM 
Sbjct: 61 KVDYPKRSRAKIVEAYFRQASTTDYCGVYKGHYVDFEAKETRQKTAMPMKNFHLHQIEHM 120 

Query: 121 ANVLQQKGICFVLLHFSTLKETYLLPANELISFYQ1DKGKKSMPIDYIRKNGFFVKESAF 180 

. A VL QKGICFVLLHFSTLKETY LPA IiISFYQID G+KSMPIDYIRKNGF V AF 
Sbjct: 121 ACVLHQKGICFVLLHFSTLKETYYLPAQALISFYQIDNGSKSMPIDYIRKNGFK\?AFGAF 180 

Query: 181 PQVPYLDI IEEKLLGGDYN 199 

PQVPYI.+IIE+ LGGDYN 
Sbjct: 181 PQVPYLNI IEQNFLGGDYN 199 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 967 

A DNA sequence (GBSxl026) was identified in S.agalactiae <SEQ ID 2955> which encodes the amino 
acid sequence <SEQ ID 2956>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3227 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) ^ suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14136 GB:Z99115 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

40 Identities = 74/174 (42%) , Positives = 97/174 (55%) , Gaps = 6/174 (3%) 

Query: 5 ILVTGYKNFELGIFQDKDPRITIIIOCAIDKDFRRFLENGADWFIFMGNLGFEYWALEVAL 64 

+ +TGYK FELGIF+ D + IKKAI FL+ G +W + G LG E WA E A 

Sbjct: 4 LAITGYKPFELGIFKQDDKALYYIKKAIKNRLIAFLDEGLEWILISGQLGVELWAAFAAY 63 

45 

Query: 65 DLQKEY-DFQIATIFTFENHGQNWNEANKAKL-ALFKQVDF-VKYTFPSYENPGQFKQYN 121 

DLQ+EY D ++A I F +NW E NK + A+ Q D+ T YE+P QFKQ N 
Sbjct: 64 DLQEEYPDLKVAVITPFYEQEKNWKEPNKEQYFA.VLAQADYEASLTHRPYESPLQFKQKN 123 

50 Query: 122 HFLINNTQGAYLFYDSENETNLKFLLEMMEKK EAYDISFLiTFDRLNEIYEE 172 

FI+ + G LYDEE + K++L EK+ + Y I F+T D L EE 
Sbjct: 124 QFFIDKSDGLLLLYDPEKEGSPKYMIjGTAEKRREQDGYPIYFITMDDLRVTVEE 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2957> which encodes the amino acid 
55 sequence <SEQ ID 2958>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

- Final Results 
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bacterial cytoplasm --- Certainty=0 . 3041 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



5 An alignment of the GAS and GBS proteins is shown below. 



Identities = 102/167 (61%) , Positives = 127/167 (75%) 



10 



Query: 3 STILVTGYKNFELGIFQDKDPRITIIKKAIDKDFRRFLENGADWFIFMGNLGFEYWALEV 62 

+ IL+TGY++FE+GIF KDPR++IIK+AI KD +LENG DWFIF GNLGFE WALEV 
Sbjct.- 2 TAILITGYRSFEIGIFDHKDPRVSIIKQAIRKDLIGYLENGVDWFIFTGNLGFEQWALEV 61 



Query: 63 ALDLQKEYDFQIATIFTFENHGQNVmEANKAKLALFKQVDFVICYTFPSYENPGQFKQYNH 122 

A +L++EY QIATIF FE HG WNE NK L+ F+ VDFVKY FP+YE P QF QY 
Sbjct: 62 ANELKEEYPLQIATIFLFETHGDRWNEKNKEVLSQFRAVDFVKYYFPNYEQPTQFSQYYQ 121 



15 



Query: 123 FLINNTQGAYLFYDSENETNLKFLLEriMEKKEAYDISFLTFDRUIEI 169 

FL+ T+GAY+FYD+ENETNLK+ h+ + Y + LTFDRLN++ 
Sbjct: 122 FLLEKTEGAYVFYDTENETNLKYFLKKAKDMPHYQLLLLTFDRLNDM 168 



20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 968 

A DNA sequence (GBSxl027) was identified in S.agalactiae <SEQ ID 2959> which encodes the amino 

acid sequence <SEQ ID 2960>. Analysis of this protein sequence reveals the following: 

25 Possible site: 23 

>>> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 969 

A DNA sequence (GBSxl028) was identified in S.agalactiae <SEQ ID 2961> which encodes the amino 
acid sequence <SEQ ID 2962>. This protein is predicted to be cell division protein DivIVA. Analysis of this 
40 protein sequence reveals the following: 
Possible site: 16 



Final Results 



30 



bacterial cytoplasm Certainty=0 . S188 (Affirmative) 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < 
bacterial outside Certainty=0 . 0000 (Not Clear) < 




>>> Seems to have 



N-terminal signal seguer.ee 



Final Results 



45 



bacterial cytoplasm Certainty=0 .273 5 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9535> which encodes amino acid sequence <SEQ ID 9536> 
50 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB14135 GB:Z99115 ypsB [Bacillus subtilis] 
Identities = 46/102 (45%) , Positives = S9/102 (67%) , Gaps = 14/102 (13%) 

Query: 14 SPKDIFEQDFKVSMRGYDKKEVDVFLDDVIKDYENYLEQIEKLQMENRRLQQALDKKESE 73 

S K+I E++FK +RGY +++VD FLD +IKDYE + ++IE+LQ EN +L++ L+ E 
Sbjct: 9 SAKEILEKEFKTGVRGYKQEDVDKFLDMIIKDYETFHQEIEELQQENLQLKKQLE E 64 

Query: 74 ASNVRNSGTAMYNQKPIAQSATNFDILKRISRLEKEVFGRQI 115 

AS ++P+ + TNFDILKR+S LEK VFG ++ 

Sbjct: 65 AS KKQPVQSNTTNFDILKRLSNLEKHVFGSKL 96 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2963> which encodes the amino acid 
sequence <SEQ ID 2964>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4466 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/112 (63%), Positives = 85/112 (75%), Gaps = 6/112 (5%) 

Query: 8 ^SIIYSPKDIFEQDFKVSMRGYDKKEVDVFLDDVIKDYENYLEQIEKLQMENRRLQQAL 67 

M SIIYSPKDIFEQ+FK SMRG+DKKEVD FLD+VIKDYEN+ QIE L+ EN +AL 
Sbjct: 1 MTSIIYSPKDIFEQEFKTSMRGFDKKEVDEFLDNVIKDYENFNAQIEALKAEN EAL 56 

Query: 68 DKKESEASNVRNSGTAMYNQKP - - IAQSATNFDILKRISRLEKEVFGRQIRE 117 

K + +A N ++ +P +AQSATNFDILKRIS+LEKEVFG+QI E 

Sbjct: 57 KKAKFQARNTVSATVQQPVPQPTRVAQSATNFDILKRISKLEKEVFGKQIIE 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 970 

A DNA sequence (GBSxl029) was identified in S.agalactiae <SEQ ID 2965> which encodes the amino 
acid sequence <SEQ ID 2966>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence (or aa 1-19) 

Final Results 

bacterial cytoplasm Certainty=0. 0655 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14134 GB:Z99115 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 204/382 (53%), Positives = 274/382 (71%), Gaps = 3/382 (0%) 

Query: 3 ESFKLIATAAAGLEAIVGREIRNLGIDCQVENGRTOFHGDIKTIIETMliWIjRAADRIKII 62 

+ + LIATA G+EA+V +E+R+LG +C+V+NG+V F GD I NLWLR ADRIK+ 
Sbjct: 2 KKYTLIATAPMGIEAVVAKEVraLGYECKTONGKVIFEGDAIAICRAraiWLRTADRIKVQ 61 

Query: 63 VGEFPAPTFEELFO^WGIIJWBNYLPI^SUCFPIAKAKCT^KLHNEPSVQAISKKAVAKK 122 

V FA TF+ELF+ ++W +++P KFP+ K VKS L + P Q I KKA+ +K 
Sbjct: 62 VASFKAKTFDELFEKTKAINVTOSFIPENGKFPVT-GKSVKSTIiASVPDCQRIVKKAIvEK 120 

Query: 123 LQKVFHRPEGVPLQENGAEFKIEVSILKDKATVMIDTTGSSLFKRGYRAEKGGAPIKENM 182 
h K+ 4.+E GAE+K+E+S+LKD+A + +D+4-G+ L KRGYR ++GGAPIKE + 
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Sbjct: 121 L-KLQSGKANDWIEETGAEYKVEISIiKDQAIilTLDSSGTGLHKRGYRVDQGGAPIKETL 179 

Query: 183 AAAIIQLSNWFPDKPLIDPTCGSGTFCIEAAMIGMNIAPGFNRDFAFEAWPIWDQSQVQK 242 

AAA++QL+NW PD+P +DP CGSGT IEAA+IG NIAPGFNRDF E W W+ + K 
Sbjct: 180 AAALVQLTNOTPDRPFVDPFCGSGTIAIEAALIGQNIAPGFNRDFVSEDWEWIGKDLWNK 239 

Query: 243 VRDEAESKANYDIDLDISGFDLDGRMVEIARKNAEEAGLGDVIKLKQMRLQDLKTDKING 302 

REE KANYD h I D+D RMV+IA++NAEEAGLGD+I+ KQM+++D T+ G 
Sbjct: 240 ARLEVEEKANYDQPLTIFASDIDHRMVQIAKFJaAEEAGLGDLIQFKQMQVKDFTTNLEFG 299 

Query: 303 VIISNPPYGERLLDDJCAVDILYNEMGQTFAPLKTKSKFILTSDEGFEKKYGSQADKKRKL 362 

VI + NPPYGERL 4- KAV+ +Y EMGQ F PL TWS ++LTS+E FE+ YG +A KKRKL 
Sbjct: 300 VIVGNPPYGERLGEKKAVEQMYKEMGQAFEPLDTKSVYMLTSJ3ENFEEAYGRKATKKRKL 359 

Query: 363 YNGTLKVDLYQYYGERVRRQVK 384 

+NG +K D YQY+ +VR Q K 
Sbjct: 360 FNGFIKTDYYQYW- SKVRPQRK 3 80 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2967> which encodes the amino acid 
sequence <SEQ ID 2968>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 317/383 (82%) , Positives = 354/383 (91%) 

Query: 1 MKESFKLIATAAAGLEAIVGREIRNLGIDCQVENGRVRKHGDIKTIIETNLWtiRAaDRIK 60 

MKE+F+L+ATAAAGLEA+VG+E+R LG DCQVENG+V F GD++ I++TNLWLRAADRIK 
Sbjct: 1 MKETFRLVATAAAGLEAWGKEVRALGFDCQVENGKVYFEGDVEAIVKTKniiWLFAADRIK 60 

Query: 61 IIVGEFPAPTFEELFQGVYGLDWENYLPLGAKFPIAKAKOTKSKLHNEPSVQAISKKAVA 120 

IIVG+FPA TFEELFQGV+ LDWENYLPLGAKFPI+KAKCVKSKLHNEPSVQAI+KKAV 
Sbjct: 61 IIVGQFPARTFEELFQGVFALDWENYLPLGAKFPISKAKCVKSKLHNEPSVQAITKKAVV 120 

Query: 121 KKLQKVFHRPEGVPLQENGAEFKIEVSILKDKATVMIDTTGSSLFKRGYRAEKGGAPIKE 180 

KKLQK FHRPEGVPLQE G+ F IEVSILKD+AT+MIDTTGSSLFKRGYR +KGGAPIKE 
Sbjct: 121 KKLQKHFHRPEGVPLQEVGSTFNIEVSIDKDQATIMIDTTGSSLFKRGYRVQKGGAPIKE 180 

Query: 181 NMAAAIIQLSNWFPDKPLIDPTCGSGTFCIEAPi-lIGMNIAPGFNRDFAFEAWPWVDQSQV 240 

NMAAAI+ LSNWFPDKPL+DPTCGSGTFCIEAAMIGMNIAPGFNR FAFE W WVD+ V 
Sbjct: 181 NMAAAIIALSlWFPDKPLVDPTCGSGTFCIFA^nGMNIAPGFNRSFAFEEWSWVDKDMV 240 



Query: 301 NGVIISNPPYGERLLDDKAVDILYNEMGQTFAPLKTIVSKFILTSDEGFEKKYGSQADKKR 360 

NGV+ISNPPYGERLLDDK&VDILYNEMGQTFAPLKTWSKFILTSDE FE KYG +ADKKR 
Sbjct: 301 NGWISNPPYGERIjLDDKAVDILYNEMGQTFAPLEC3^SKFILTSDELFELKYGQI<ADKICR 360 

Query: 361 KLYNGTLKVDLYQYYGERVRRQV 383 

KLYNGTLKVDLYQ+YGERV+R +• 
Sbjct: 361 KLYNGTLKVDLYQFYGERVKRHL 383 

SEQ ID 2966 (GBS255) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 7; MW 44kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 4; MW 69kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 971 

A DNA sequence (GBSxl030) was identified in S.agalactiae <SEQ ID 2969> which encodes the amino 
acid sequence <SEQ ID 2970>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-15.02 Transmembrane 171 - 187 ( 167 - 193) 

Final Results 

bacterial membrane Certainty=0 . 7007 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD16120 GB.-AF094508 dentin phosphoryn [Homo sapiens] 
Identities = 71/398 (17%) , Positives » 152/398 (37%) , Gaps = 16/398 (4%) 

Query: 16 TDGLEFKDAK-EMTVEEAVRKDSEIKAGITEEDSILDKYIKQHRDEVASQKFETKSSDFA 74 

+D + D+K + + E+ DS+ K+ ++ +S D S S 

Sbjct: 152 SDSSDSSDSKSDSSKSESDSSDSDSKSDSSDSNSSDSSDNSDSSDSSNSSNSSDSSDSSD 211 

Query: 75 NLDTASLDDFIKKQREELSAMLAAEELSKKLDNSVSQEQDTEANAVSPKEESSQEQENSV 134 
+ D++S D + S + S+ D+S S + D+ ++ S SS ++ 

Query: 135 TPVPPI^NTEAEPTATEPDSTIADSEEYKSSSKKRGGIVGTLIALILLLIVAIFGYNYFKN 194 
+ + ++ + + S +DS + SS + + + N + 

Query: 195 NNSTNSQTATSQSSSSKATTTSSEEDKKASQNLDNFNKSYANFFVDDKKTQLKNSEFDKL 254 

N+S+NS ++ S SS ++ +S D S + D+ N S D +S+ 

Sbjct: 324 NDSSNSSDSSDSSDSSDSSNSSDSSDSSDSSDSDSSNSS - -DSSNSSDSSDSCNS 376 

Query: 255 SELEKKVDALKGTKyYGKVKVKFDSLI<KQIDAVKAVNDKFKSPAvA/DGKKSEKLEVKTJGA 314 

S+ D+G+ + + D++N S+ +S+D + 

Sbjct: 377 SDSSDSSDSSDGSDSDSSNRSDSSNSSDSSDSSDSSNSSDSSDSSDSNESSNSSDSSDSS 436 



Identities = 64/341 (18%), Positives = 140/341 (40%), Gaps = 35/341 (10%) 

Query: 59 DEVASQKFETKSSDFANLDTASLDDFIKKQREELS-AMLAAEELSKKLDNSVSQEQDTEA 117 
D+ S K ++ SSD + D+++ D + S + +++ S D+S S + D+ 

Query: 118 NAVSPKEESSQEQENSVTPVPPLNTEAEPTATEPDSTIADSEEYKSSSKKRGGIVGTLIA 177 

++ S s + +S +++++ + +E DS+ +DS+ S S 
Sbjct: 136 SSNSSDSSDSSDSSDSSDSSDSSDSKSDSSKSESDSSDSDSKSDSSDSN 184 

Query: 178 LILLLIVAIFGYNYFKMNNSTNSQTATSQSSSSKATTTSSEEDKKASQNLDNFNKSYANF 237 

+++S NS ++ S +SS+ + ++ S + +S + D+ N S ++ 
Sbjct: 185 SSDSSDNSDSSDSSNSSNSSDSSDSSDSSDSSSSSDSSNSSDSS- 22B 

Query: 238 FVDDKKTQLKNSEFDKLSELEKKVDALKGTKYYGKVKVTCFDSLKRQID 297 
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Query: 298 AWDGKKSEKLEVKDGANFDSLDSKTLNTGNASLDSLLHSIVSTGRWQVKQSEEQASSNK 357 

+ S+ + D +N S DS + + S DS S + N S+ SS+ 

Sbjct: 284 SSDSSNSSDSSDSSDSSN--SSDSSDSSDSSDSSDSSNSSDSNDSSNSSDSSDSSDSSDS 341 

Query: 358 VSDTQITEQPIiTVTNGQSSSSAftTINl'IQAAGTASGMLERNRS 398 
+ + ++ + ++ SS+S+ + N+ + + + + + S 

A related DNA sequence was identified in S. pyogenes <SEQ ID 297 1> which encodes the amino acid 
sequence <SEQ ID 2972>. Analysis of this protein sequence reveals the following: 

Possible site: 28 



Final Results 

bacterial membrane Certainty=0 . 6880 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF15293 GB:AF202180 erythrocyte membrane-associated giant 
protein antigen 332 [Plasmodium falciparum] 
Identities = 41/173 (23%) , Positives = 87/173 (49%) , Gaps = 10/173 (5%) 

Query: 1 VSEESKEVEVTKESQTLGLNEAKSMTIGEAVRKQSE IKAGVTKDDSILDKYIKQHR 56 

+ E + V4KE + GL+ + + ++V +Q+E I + K+ S ++ ++ 
Sbjct: 78 IEEAEENVWIEKEVEEEGLDNEEVIDEEDSVSEQAEEEVYINEEILKESSDVEDVKVENE 137 

Query: 57 DEVSSQKFDAKYTELDTASLDNFIKKQREALSKAGLVDDEPVSAESAEQDSTLVEEV 113 

4-EV+ + + LDN++ ++ E++++ +VD+ P S E E +S ++EE+ 

Sbjct: 138 LM^EV^ETQSVAENNEEDKELDNYVVEETESVTEEVV'TDEVPNSKEVQEIES-IIEEI 196 

Query: 114 AEDIAPMETTAVVTGIPVEATVPVLDLDPSERVIPEPQMTKEEPKRDQFLSED 166 

ED + G +E V + D SE ++ E +T+E K+4 ++ED 

Sbjct: 197 VEDGIjTTDDLVGQQGSVIEEVvEEVGSD-SEGI'VEEASITEEVEKKES-'VTED 247 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/506 (46%) , Positives = 304/506 (59%) , Gaps = 36/506 (7%) 



Query: 


1 


Sbjct: 


1 






Sbjct : 


59 




121 


Sbj ct : 


113 




165 


Sbjct: 






225 


Sbjct: 


233 


Query: 


285 



VASQKFETKSSDFANLDTASLDDFIKKQR3ELSAMLAAEELSKKLDNSVSQEQDTEANAV 120 
V+SQKF+ K + LDTASLD+FIKKQRE LS A + + ++ S EQD+ 
VSSQKFDAK- - -YTELDTASLDNFIKKQREALSK- - -AGLVDDEPVSAESAEQDSTLVEE 112 

SPKEESSQEQENSVTPVPPLNT EAEPTATEP- -DSTIADSEEYKSS 164 



? K Y F+ D K++LKNS F L +LE + AL+G+ YY K K K DSLK+ I 



A+ AVN KF S WDG+K EVK ANFD L S TL GNA+LD++L + 
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Sbjct: 293 AAITAVNGKFVSDWVDGEKVSA-E^/KADANFDDLSSATLTIGMANLDAVLQASITEGRQ 351 

Query: 345 QVKQSEEQASSNKVSDTQITEQPNVTNGQSSSSAATIMNQAAGTAS---GNIjERNRSRVP 401 

Q+ E A K ++ Q Q GQS+S A + G S +L+R+ SRVP 

Sbjct: 352 QLASKAEAA KAANEQAV- QDQAAQGQSTSVAPS GYGLTSYDPASLQRHLSRVP 403 

Query: 402 YNNAAIADTGNPAWI FNPGVLEKIVATSQARGYFSGNNYILEPVNI INGNGYYNMFKLDG 461 

YN IAD NP+W FNPGVLEKIVATSQARGY SGN YILEPVNIINGNGYYNMFK DG 
Sbjct: 404 YNQDVIADRANPSWAFNPGVLEKIVATSQARGYISGNQYILEPVNIINGNGYYNMFKPDG 463 

Query: 462 TYLFS INAKTGYFVGNAPGRADSLDY 487 

TYLFSIN KTGYFVGN G AD+LDY 
Sbjct: 464 TYLFS INCKTGYFVGNGKGYADALDY 489 

SEQ ID 2970 (GBS351) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 2; MW 57kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 5; MW 82kDa). 

GBS351-GST was purified as shown in Figure 216, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 972 

A DNA sequence (GBSxl031) was identified in S.agalactiae <SEQ ID 2973> which encodes the amino 
acid sequence <SEQ ID 2974>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3169 (Affirmative) < euco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2975> which encodes the amino acid 
sequence <SEQ ID 2976>. Analysis of this protein sequence reveals the following: 

j N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3169 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 129/160 (80%) , Positives = 149/160 (92%) 

Query: 1 MTKEVWESFELDHTIVKaPYVRLISEEVGPVGDIITNFDIRLIQPNENAIDTAGrjHTIE 60 

MTKEV+VESFELDHTIVKAPYVRLISEE GP GD I TNFD+RL+QPN+N+ 1 +TAGLHTIE 
Sbjct: 1 MTKEVIVESFELDHTIVKAPYVRLISEEFGPKGDRITNFDVRLVQPNQNSIETAGLHTIE 60 

Query: 61 HLLAKLIRQRINGLIDCSPFGCRTGFHMIMWGKQDATEIAKVIKSSLEAIAGGVTWEDVP 120 

HLLAKLIRQRI+G+ IDCSPPGCRTGFH+ IMWGK +T+IAKVIKSSLE IA G+TWEDVP 
Sbjct: 61 HIiLAKLIRQRIDGMIDCSPFGCRTGFHLII>IWGKHSSTDIAKVIKSSLEEIATGITWEDVP 120 

Query: 121 GTTIESCGNYKDHSLHSAQEWAKLILSQGISDNAFERHIV 160 

GTT+ESCGNYKDHSL +A+EWA+LI+ QGISD+ F RH++ 
Sbjct: 121 GTTLESCGNYKDHSLFAAKEWAQLIIDQGISDDPFSRHVI 160 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 973 

A DNA sequence (GBSxl032) was identified in S.agalactiae <SEQ ID 2977> which encodes the amino 
5 acid sequence <SEQ ID 2978>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

10 bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAF34762 GB:AF228345 unknown [Listeria monocytogenes] 

Identities = 302/532 (56%), Positives = 400/532 (74%), Gaps = 14/532 (2%) 



Query: 4 IILAMVCALIGLIIGWAISMKMKSSiay^AELTLLNAEQDAVDLRGKAEIEAEHIRKAM 63 
I + ++ +L+ LI+G V S+ KSS E+ RG AE+ E +K AE 

20 Sbjct: 3 IAITIISSLLFLIVGLWGSLIFKSS TEKKLAAARGTAELIVEDAKKEAE 52 



Query: 64 RESKAHQKELLLEAKEEARKYREEIEKEFKSDRQELKQMEARLTDRASSLDRKDENLSNK 123 

+KE LLEAKEE + R EIE E + RE++ERL R +LDRKD +LS + 
Sbjct: 53 TT KKEALLEAKEENHRLRTE1ENELRGRRTETQKAENRLLQREENLDRKDTSLSKR 108 

Query: 124 EKMLDSKEQSLTDKSRHINEREQEIATLETTKKVEELSRIAEIiSQEEAKDIILADTEKDLA 183 

E L+ KE+S++ + + I E+E ++A + + EL RI+ LS+EEAK IIL E++L 
Sbjct: 109 EATLERKEES I S KRQQQIEEKESKLAEMIQAEQTELERI SALSKEEAKS 1 1 LNQVEEELT 16S 

Query: 184 HDIATRIKEAEREVKDRSNKIAKDLLAQAMQRLAGEYOTEQTITTVHLPDDNMKGRIIGR 243 

HD A +KE+E K+ S+K AK++L+ A+QR A ++V E T++ V LP+D MKGRIIGR 
Sbjct: 169 HDTAIMVKESENRAKEESDKKAKNILSrAIQRCAADHVAETTVSVVTLPNDEMKGRIIGR 228 

Query: 244 EGRNIRTLESLTGIDVIIDDTPEVWLSGFDPIRREIARMTLESLIQDGRIHPARIEELV 303 

EGRNIRTLE+LTGID+ 1 IDDTPE V+LSGFDPIRREIAR+ LE L+QDGRIHPARIEE+V 
Sbjct: 229 EGRNIRTLETLTGIDLIIDDTPEAVILSGFDP1RREIARIALEKLVQDGRIHPARIEEMV 288 

Query: 304 EKNRLEMDQRIREYGEAAAyEIGAPNLHPDLIKIMGRLQFRTSYGQNVLRHSVEVGKLAG 363 

+K R E+D+ IRE GE A +E+G ++HPDLIKI+GRL++RTSYGQNVL HS+EV KLAG 
Sbjct: 289 DKARKEVDEHIREVGEQATFEVGIHSIHPDLIKILGRLRYRTSYGQNVLNHSLEVSKLAG 348 

Query: 364 IIAGELGEITODIiARRAGFLHDMGKAIDREVEGSHVEIGMEFARKYKEHPIVVNTIASHHG 423 

ILAGELGE+V LA+RAG LHD+GKAID E+EGSHVEIG+E A KYKE+ +V+N+IASHHG 
Sbjct: 349 ILAGELGEDVTLAKRAGLLHDIGKAIDHEIEGSHVEIGVELATKYKENDWINSIASHHG 408 

Query: 424 DVEPDSVIAVIVAAADALSSARPGARNESKENYIKRLRDLEEIANGFEGVQNAFALQAGR 483 

D E SVIAV+VAAADALS+ARPGAR+E++ENYI+RL LEEI+ ++GV+ ++A+QAGR 
Sbjct: 409 DTEaTSVIAVLVAAADALSAARPGARSETLENYIRRLEKLEEISESYDGVEKSYAIQAGR 468 

Query: 484 EIRIMVQPGKVSDDQWIMSHKTOEKIEQNLDYPGNIIOTTVIREMRAVDFAK 535 

E+RI+V+P + D ++ +R++IE+ LDYPG+IKVTVIRE RAV++AK 

Sbjct: 469 EVRIIVEPDTIDDLSSYRLARDIRI'CRIEEELDYPGHIICVTVIRETRAVEYAK 520 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2979> which encodes the amino acid 
sequence <SEQ ID 2980>. Analysis of this protein sequence reveals the following: 

Possible site: 32 



»> Seems to have a cleavable N-term signal seq. 



60 



Final Results 
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bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

>GP:AAF34762 GB:AF228345 unknown [Listeria monocytogenes] 
Identities = 299/534 (55%) , Positives = 408/534 (75%) , Gaps = 14/534 (2%) 

Query: 2 VNIILLIVSALIGL1LGYALISIRLKSAKEAAELTLLNAEQEAVDIRGKAEVDAEHIKKT 61 
10 + I + I+S+L+ LI+G + S+ KS+ E++ RG AE+ I + 

Sbjct: 1 MTIAITIISSLLFLIVGLWGSLIFKSS TEKKLAAARGTAEL IVED 46 

Query: 62 AKRESKANRKELLLEAKEEARKYREEIEQEFKSERQELKQLETRLAERSLTLDRKDKNLS 121 
AK+E++ +KE LLEAKEE + R E1E E + R E ++ E RL +R LDRKD +LS 
15 Sbjct: 47 AKKEAETTIO<EALLEAKEENHRLRTEIENELRGRRTETQKAENRLLQREENLDRKDTSLS 106 

Query: 122 SKEKVLDSKEQSLTDKSKHIDERQLQVEKLEEEKKAELEKVAAMTIAEAREVILMETENK 181 

+E L+ KE+S++ + + I+E++ ++ ++ + ++ ELE+++A++ EA+ +IL + E + 
Sbjct: 107 KREATLERKEESISKRQQQIEEKESKLAEMIQAEQTELERISALSKEEAKSIILNQVEEE 166 

20 

Query: 182 LTHE I ATRIRDAERD I KDRTVKTAKDLLAQAMQRLAGEYVTEQTI TSVHLPDDNMKGRI I 241 

LTH+ A ++++E K+ + K AK++L+ A+QR A ++V E T++ V LP+D MKGRII 
Sbjct: 167 LTHDTAI^WKESENRAKEESDKKAKNILSIAIQRCAADHVAETTVSVvTLPNDEMKGRII 226 

25 Query: 242 GREGRNIRTLESLTGIDVIIDDTPEWILSGFDPIHREIARMTLESLIADGRIHPARIEE 301 

GREGRNIRTLE+LTGID+ 1 IDDTPE VILSGFDPIRREIAR+ LE L+ DGRIHPARIEE 
Sbjct: 227 GREGRNIRTLETLTGIDLIIDDTPEAVILSGFDPIRREIARIALEKLVQDGRIHPARIEE 286 

Query: 302 LVEKNRLEMDNRIREYGEAAAYEIGAPNLHPDIilKIMGRLQFRTSFGQNVLRHSVEVGKL 361 
30 +V+K R E+D IRE GE A +E+G ++HPDLIKI+GRI1++RTS+GQNVL HS+EV KL 

Sbjct: 287 MVDKARKEVDEHIREVGEQATFEVGIHSIHPDLIKILGRLRYRTSYGQNvIiNHSIjEVSKEj 346 

Query: 362 AGILAGELGENVALARRAGFLHDMGKAIDREVEGSHVEIGMEFARKYKEHPWVNTIASH 421 
AGILAGELGE+V LA+RAG LHD+GKAID E+EGSHVEIG+E A KYKE+ W+N+IASH 
35 ( Sbjct: 347 AGILAGELGEDVTLAKRAGLLHD1GKAIDHEIEGSHVEIGVELATKYKENDVVINSIASH 406 

Query: 422 HGDVEPDSVIAVljVAAADALSSARPGARNESMEKYIKRLRDIiEEIATSFDGVQNSFALQA 481 

HGD E SVIAVLVAAADALS+ARPGAR+E++ENYI+RL LEEI+ S+DGV+ S+A+QA 
Sbjct: 407 HGDTEATSVIAVLVAAADAljSAARPGARSETLENYIRRLEKLEEISESYDGVEKSYAIQA 466 

40 

Query: 482 GREIRI^QPEKISDDQWILSHKTOEKIENNLDYPGNIKVTVIREMRAVDYAK 535 

GRE+RI+V+P+ ID L+ +R++IE LDYPG+IKVTVIRE RAV+YAK 

Sbjct: 467 GREVRIIVEPDTIDDLSSYRLARDIRKRIEEELDYPGHIKVTVIRETRAVEYAK 520 

45 An alignment of the GAS and GBS proteins is shown below. 

' Identities = 451/535 (84%) , Positives = 503/535 (93%) 

MFNIIIAMVCALIGLIIGYVAISMKMKSSKEAAELTLLNAEQDAVDLRGKAEIEAEHIRK 60 
M NIIL +V ALIGLI+GY IS+++KS+KEAAELTLLNAEQ+AVD+RGKAE++AEHI+K 
MVNIILLIVSALIGLILGYALISIRLKSAKEAAELTLLNAEQEAVDIRGKAEVDAEHIKK 6 0 

AAERESKAHQKELLLEAKEEARKYREEIEKEFKSDRQELKQMEARLTDRASSLDRKDENL 120 
A+RESKA++KELLLEAKEEARKYREEIE+EFKS+RQELKQ+E RL +R+ +LDRKDENL 



LE +K EL ++A ++ EA+++IL +TE 



L H+IATRI++AER++KDR+ K AKDLLAQAKQRLAGEYVTEQTIT+VHLPDDNMKGRI 



IGREGRNIRTLESLTGIDVIIDDTPEW+LSGFDPIRREIARMTLESLI DGRIHPARIE 





1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbj ct : 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 
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Query: 301 ELVEKNRLEMDQRI REYGEAAAYE I GAPNLH PDL I KI MGRLQFRTS YGQNVLRHSVEVGK 360 

ELVEKNRLEMD RIREYGEAAAYEIGAPNLHPDLIKIMGRLQFRTS+GQNVLRHSVEVGK 
Sbjct: 301 ELVEKNRLEMDNRIREYGEAAAYEIGAPNLHFDLIKIMGRLQFRTSFGQNVLRHSVEVGK 360 

Query: 361 LAGIi^GELGENVDIjARRAGFLHDMGKAIDREVEGSHVEIGMEFA^KYKEHPIVVNTI^ 420 

LAGILAGELGENV LARRAGFLHDMGKA.IDREVEGSHVEIGMEFARKYKEHP+VWTIAS 
Sbjct: 361 LAGII^GELGEWALARRAGFLHDMG:0\IDREVEGSIIVEIGMEFARKYKEHPVVVNT1AS 420 

Query: 421 HHGDVEPDSVIAVI VAAADALS SARPGARNESMENYI KRLRDLEE IANGFEGVQNAFALQ 480 

HHGDVEPDSVIAV+VAAADALSSARPGARNESMENY1KRLRDLEEIA F+GVQN+FALQ 
Sbjct: 421 HHGDVEPDSVIAVLVAAADALSSARPGARNESMENYIKRLRDLEEIATSFDGVQNSFALQ 480 

Query: 481 AGREIRIMVQPGKVSDDQWIMSHKVREKIEQNLDYPGNIKVTVIREMRAVDFAK 535 

AGREIRIMVQP K+SDDQVVI+SHKVREKIE NLDYPGNIKVTVIREMRAVD+AK 
Sbjct: 481 AGREIRIMVQPEKISDDQWILSHKVREKIENlttDYPGNIKVTVIREMRAVDYAK 535 

SEQ ID 2978 (GBS86) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 6; MW 59kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 5; MW 84kDa). 

GBS86-GST was purified as shown in Figure 192, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 974 

A DNA sequence (GBSxl033) was identified in S.agcdaetiae <SEQ ID 2981> which encodes the amino 
acid sequence <SEQ ID 2982>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4984 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 975 

A DNA sequence (GBSxl034) was identified in S.agalactiae <SEQ ID 2983> which encodes the amino 
acid sequence <SEQ ID 2984>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
' »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.87 Transmembrane 146 - 162 ( 146 - 162) 

Final Results 

bacterial membrane --- Certainty=0. 2147 (Affirmative) < suco 
bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 8697> which encodes amino acid sequence <SEQ ID 8698> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -10.72 
5 GvH: Signal Score (-7.5): -5.6S 

Possible site: 29 
>>> Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -2.87 threshold: 0.0 

INTEGRAL Likelihood = -2.87 Transmembrane 138 - 154 ( 138 - 154) 
10 PERIPHERAL Likelihood =3.76 51 

modified ALOM score: 1.07 



k Reasoning Step: 3 

— Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 2147 (Affirmative) ■ 
■- Certainty=0. 0000 (Not Clear) < : 
■- Certainty=0. 0000 (Not Clear) < i 



20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG21390 GB:AF302051 ABC transporter ATP binding subunit 
[Bacillus licheniformis] 
Identities = 84/218 (38%) , Positives = 138/218 (62%) , Gaps - 1/218 (0%) 

25 Query: 12 DIIKVDHIFKSIGQKTILEDISFSIASNQCVALIGPNGAGKTTLMSTLLGDISISSGSLT 71 

+++ + ++ K+ QKT ++ I FSI + VA++GPNGAGKTT +S +LG + ++G++T 
Sbjct: 3 NWSLTNVTKTFRQKTAVDQIDFSIKKGEIVAILGPNGAGKTTTISMILGLLKPTAGNIT 62 

Query: 72 IFNLPAHHNRLKyKVAILPQE-NVLPSKFTVRELIDFQRCLFPEVLPMSLILDYLQWSDT 130 
30, " +F+ H R++ K+ + QE +V+P E+I+ R +P+ L + +D 

Sbjct: 63 LFDSMPHEKRVREKIGTMLQEVSVMPGLRCRVEIIELIRSYYPKPLSFQKLRTLTGLTDK 122 

Query: 131 mWPI^I^(^!^lJ^^^^J^^P^S^SGmTSTRQRmEhIATLy3m^ 190 
L+ E LSGGQKR L F L L G P+L+ DEPT GMD ++R RFW+ + +L ++G T 
35 Sbjct: 123 DLKTQAEKLSGGQKRRLGFAIALAGDPEajMIFDEPTVGMDITSRNRFWQTVQSLAEQGKT 182 

Query: 191 IVYSSHYIEEVEHTADRILVLHKGKLLRDTTPLCHEAR 228 

I++S+HY++E + A RIL+ GK++ D TPL ++R 
Sbjct: 183 IIFSTHYLQEADDAAQRILLFKDGKIVADGTPLQIKSR 220 

40 

There is also homology to SEQ ID 686. 

SEQ ID 8698 (GBS350) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 72 (lane 13; MW 28.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 4; MW 54kDa). 

45 GBS350-GST was purified as shown in Figure 226, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 976 

A DNA sequence (GBSxl035) was identified in S.agalactiae <SEQ ID 2985> which encodes the amino 
50 acid sequence <SEQ ID 2986>. Analysis of this protein sequence reveals the following: 

J- terminal signal sequence 



55 



Final Results 

bacterial cytoplasm Certainty=0. 2913 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 977 

A DNA sequence (GBSxl036) was identified in S.agalactiae <SEQ ID 2987> which encodes the amino 
1 0 acid sequence <SEQ ID 2988>. Analysis of this protein sequence reveals the following: 



Possible site: 31 



> Seems to have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood =- 


10.51 


Transmembrane 


222 ■ 


■ 238 


( 214 ■ 


- 241) 


INTEGRAL 


Likelihood = 


-6.90 


Transmembrane 


104 ■ 


- 120 


( 101 ■ 


- 125) 


INTEGRAL 


Likelihood = 


-5.84 


Transmembrane 


140 • 


■ 156 


( 138 - 


• 159) 


INTEGRAL 


Likelihood = 


-5.20 


Transmembrane 


19 ■ 


• 35 


( 18 • 


• 41) 


INTEGRAL 


Likelihood = 


-1.23 


Transmembrane 


164 ■ 


■ 180 


( 164 - 


• 180) 



Final Results 

20 bacterial membrane Certainty=0 . 5203 (Affirmative) < suco 

bacterial outside ' Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:CAB69806 GB:AJ243712 YVFS protein [Bacillus cereus] 

Identities = 73/239 (30%) , Positives = 127/239 (52%) , Gaps = 4/239 (1%) 

Query: 9 KMEFLLTKRQLMaimiGMPVAFFLFFSGFMGEGLTKAIEAIYVRNYMimiAGFSSLSF 68 
K+E L T R + ++ MPV F+ F+ + + +Y+I+MA FS + 

30 Sbjct: 4 KIEILRTFRNKLFIFFSLLMPVMFYYIFTNWQ VPQNGDAWKAHYLISMATFSIVGT 60 

Query: 69 AFFTFPFSMKDDQLSNRMQLLRHSPVPMWQYYLAKIIRILFYYCLAITWFLTGHILRQV 128 

A F+F + ++ LL+ +P+P Y AKII +1 V+F+ G ++ V 

Sbjct: 61 ALFSFGVRLSQERGQGWTHLLKITPLPEGAYLTAKIIAQTVVNAFSILVIFIAGILINHV 120 

35 

Query: 129 SMPIEQWMQSFLLLLGGATCFIPFGLLVSYFKNTELMSMVANICYMSLAVLGGMWMPITM 188 

+ I QW+ + L LL G T F+ G ++ K + + +ANI MSLA++GG+WMPI + 
Sbjct: 121 ELTIGQWIGAGLWLLLGVTPFLALGTVIGSIKXADAAAGLANILNMSLAIVGGLWMPIEV 180 

40 Query: 189 FPKWLQALSKLTPTYHLTQVILSPFANSFAGF-SLIILIGYGIIMLVIAYLLSQKRHSI 246 

FPK L+ + + TPTYH A G+ ++ +L GY +1 +V++ + +++ ++ 

Sbjct: 181 FPKILRTIGEWTPTYHFGSGAWDIVAGKSIGWENIAVLGGYFLIFVWSIYIRKRQEAV 239 

There is also homology to SEQ ID 682 and to SEQ ID 1628. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 978 

A DNA sequence (GBSxl037) was identified in S.agalactiae <SEQ ID 2989> which encodes the amino 
acid sequence <SEQ ID 2990>. This protein is predicted to be histidine kinase. Analysis of this protein 
50 sequence reveals the following: 

Possible site: 49 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 105 - 121 ( 102 - 124) 
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INTEGRAL Likelihood = -6.95 Transmembrane 130 - 146 ( 129 - 149) 

Final Results 

bacterial membrane --- Certainty=0 .3972 (Affirmative) < suco 

5 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9537> which encodes amino acid sequence <SEQ ID 9538> 
was also identified. 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54584 GB:AJ006400 histidine kinase [Streptococcus pneumoniae] 
Identities = 138/350 (39%), Positives = 212/350 (60%), Gaps = 3/350 (0%) 

Query: 11 MYFIPLVFLIYPIGGILYYHYPFWTLFFTLAWGAYLYSVIIRGESKYHMIAWSTMLTYI 70 
15 M++I L+F+I+PI ++ W L + FV AYL V+ + + W MLTY+ 

Sbjct: 11 MFWISLIFMIFPILSVVTGWLSAtfflLLIDILFWAYL-GVLTrKSQRLSWLYWGLMLTYV 69 

Query: 71 FYMTIFINSGFIWYIYFLSNLLVYRFRDK-LKSFRFISFACTLATWF-LCFFKASDFGD 128 
T F+ +IW+ +FLSNLL Y F + LKS +F W L F+ + 

20 Sbjct: 70 VGNTAFVAVNYIWFFFFLSNLLSYHFSVRSLKSLHVWTFLLAQVLWGQLLIFQRIEVEF 129 

Query: 129 RIMFLIVPIFCIGYMWIAIENRNSEEQREKXAEQNQYINILSAENERNRIGRDLHDSLGH 188 

L++ F + + R E+ +E +QN IN+L AENER+RIG+DLHDSLGH 

Sbjct: 130 LFYLLVILTFVDLMTFGLWIRIVEDLKEAQWQNAQINLLLAENERSRIGQDLHDSLGH 189 

25 

TFAM+++KT+LAL+L + Y +V+KEL E++ IS SM+EVR IV NLK RT+ E++ 
Sbjct: 190 TFAMLSWTDLALQLFQMEAYPQVEKELKEIHQISKDSMNEVRTIVENLKSRTLTSELET 249 

30 Query: 249 LYRLFQLSNIKLTVVNKLETSQLSPVTQSTITMILKELSNNIvKHAEADSVELSLVRQGA 308 

+ ++ +++ i++ V N L+ S L+ +ST +MIL EL NI+KHA+A V L L R 
Sbjct: 250 VKKMLEIAGIEVQVENHLDKSSLTQELESTASMILLELVTNIIKHAKASKVYLKLERTEK 309 

Query: 309 TINIEMIDNGCGFTNLDGDELHSIQERLTIVEGTLTILSRSKPTHIQWL 358 
35 + 4- + D+GCGF ++ GDELH+++ R+ G ++++S+ PT +QV L 

Sbjct: 310 ELILTVRDDGCGFASISGDELHTVRNRVFPFSGEVSVISQKHPTEVQVRL 359 

There is also homology to SEQ ID 2992. 

A related GBS gene <SEQ ID 8699> and protein <SEQ ID 8700> were also identified. Analysis of this 
40 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 10.90 
GvH: Signal Score (-7.5): -2.42 
Possible site: 49 
45 »> Seems to have a cleavable N-term signal seq. 

ALOM program count: 2 value: -7.43 threshold: 0.0 

INTEGRAL Likelihood = -7.43 Transmembrane 105 - 121 ( 102 - 124) 
INTEGRAL Likelihood = -6.95 Transmembrane 130 - 146 ( 129 - 149) 
PERIPHERAL Likelihood =0.16 61 
50 modified ALOM score : 1.99 

*** Reasoning Step: 3 

Final Results 

55 bacterial membrane --- Certainty=0. 3972 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 

vaccines or diagnostics. 
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Example 979 

A DNA sequence (GBSxl038) was identified in S.agalactiae <SEQ ID 2993> which encodes the amino 
acid sequence <SEQ ID 2994>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 28 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 49 - 55 ( 49 - 65) 



Final Results 

10 bacterial membrane — Certainty=0. 1065 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54585 GB:AJ006400 response regulator [Streptococcus pneumoniae] 
Identities = 95/153 (62%), Positives = 125/153 (81%), Gaps = 3/153 (1%) 



Query: 1 MKLLVAEDQSMLRDAMCQLLLMEESVSTIDQAGNGGEAIAILSNKAIDVMLDVEMPILS 60 

MK+LVAEDQSMLRDAMCQLL+++ V ++ OA NG EAI +L +++D+AILDVEMP+ + 
Sbjct: 1 MKVLVAEDQSMLRDAMCQLLMLQPDVESVFQAKNGQEAIQLLEKESVDIAILDVEMPVKT 6 0 

Query: 61 GLDVLEWVRKYQ-NVKVI IVTTFKRSGYFQRAIRSNVDAyVLKDRSVADLMKTIQKVLSG 119 

GL+VLEW+R + KV++VTTFKR GYF+RA+++ VDAYVLK+R+ +ADLM+T+ VL G 
SbjCt: 61 GLEVLEWIRAEKLETKVVVVTTFKRPGYFERAVKAGVDAYvLKERNIADLMQTLHTVLEG 120 

Query: 120 GKEYSPELMENVI - -SNPLSEQEIKILSLIAQG 150 

KEYSPELME V+ NPL+EQEI +L IAQG 
SbjCt: 121 RKEYSPELMEWMMHPNPLTEQE1AVLKGIAQG 153 



There is also homology to SEQ ID 2996. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 980 

A DNA sequence (GBSxl039) was identified in S.agalactiae <SEQ ID 2997> which encodes the amino 
acid sequence <SEQ ID 2998>. Analysis of this protein sequence reveals the following: 



Possible site: 34 

>>> Seems to have an uncleavafr 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



N-term signal seq 
9 Transmembrane 1 
4 Transmembrane 
3 Transmembrane 
9 Transmembrane 1 
6 Transmembrane 



73 - 92) 
102 - 119) 
38 - 59) 



Final Results 

45 bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 3 675 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAB85965 GB:AE000909 unknown [Methanothermobacter 

thermoautotrophicus] 
Identities = 46/183 (25%) , Positives = 81/183 (44%) , Gaps = 11/183 (6%) 

Query: 5 KERFDTLSDAILAIAMTILVLEI KTPATMGDIGDFTRNIGLFIVSFWVFNFW 57 

55 K+R + L DAI AIAMTILVL I PA I ++ + +SF+++ FW 

Sbjct: 6 KKRLEGLVDAIFAIAMTILVLGIDVPTGTMSVPAMDAYIMGLASDLYSYCLSFLLLGVFW 65 
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Query: 58 YERAQNSLDAQKTNDEIIALDIIEHLGICLIFLFTKFMISFENHNFAVMAYGLLTLLVGL 117 

+ + +K + I + + L+P TK ++ + + + L L 4GL 

Sbjct: 66 WVNHMHFEKLEKVDTGFIWINIWLMVVVLVPFSTKiTGNYGDLWPNILFHLNMLTIGL 125 

Query: 118 TSDIIRIRLASYDLVTIPSELKERVIKVMTTFAIESVWRFIIIIIAYFLPEVGIFAYLV 177 

+ 1 L+ I 4-+K + ++ + +IL PE AY V 
Sbjct: 126 LLSMSWIYTQRNGLMDIGENEYRLILKKNLLMPLAAI LALILTPIAPEYSSTAYAV 181 

Query: 178 IPL 180 
+ L 

Sbjct: 182 LIL 184 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 981 

A DNA sequence (GBSxl040) was identified in S.agalactiae <SEQ ID 2999> which encodes the amino 
acid sequence <SEQ ID 3000>. This protein is predicted to be guanylate kinase (gmk). Analysis of this 
protein sequence reveals the following: 
Possible site: 16 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MSERGLLIVFSGPSGVGKGTVRQE1FSTPDHKFDYSVSMTTRPQRPGEVDGVDYFFRTRE 60 

M ERGLLIV SGPSGVGKGTVRQ IFS D KF+YS+S+TTR R GEV+GVDYFF+TR+ 
Sbjct: 41 MKERGLLIVLSGPSGVGKGTVRQAIFSQEDTKFEYSISVTTRSPREGEWSVDYFFKTRD 100 

Query: 61 EFEALIKEGQMLEYAEYVGNYYGTPLSYVNETLDKGIDVFLEIEVQGALQVKSKVPDGVF 120 

EFE +1 + ++LE+AEYVGNYYGTP+ YV +TL G DVFLEIEVQGALQV++ P+G+F 
Sbjct: 101 EFEQMIADNKLLEWAEYVGNYYGTPVDYVEQTLQDGiCDVFLEIEVQGALQVRNAFPEGLF 160 

Query: 121 IFLTPPDLEELEERLVGRGTDSPEVIAQRIERAKEEIALMREYDYAWNDQVSIAAERVK 180 

IFL PP L EL+ R+V RGT++ +1 R++ AK EI +M YDY V ND V A +++K 
Sbjct: 161 IFLAPPSLSELKiraivTRGTETDALIEMiMKAAKAEIEh 220 

Query: 181 RVIEAEHYRVDRVIGRYTNMVK 202 

++ AEH + +RV RY M++ 
Sbjct: 221 AIVLAEHLKRERVAPRYKKMLE 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3001> which encodes the amino acid 
sequence <SEQ ID 3002>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
>» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) <; suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB13441 GB:Z99112 similar to guanylate kinase [Bacillus subtilis] 
Identities = 123/203 (60%) , Positives = 157/203 (76%) 



Query: 


1 


Sbjct: 


41 






Sbjct: 


101 


Query: 


121 


Sbjct: 


161 




181 


Sbjct: 


221 



+FL PP L EL++R+V RGT++ +1 R++ M EI tM TOY V ND V A +++K 



1+ EH + ERV RY KM+++ 
MVLAEHLKRERVAPRYKKMLEV 243 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 186/204 (91%), Positives = 197/204 (96%) 
Query: 1 MSERGLLIVFSGPSGVGKGTWQEIFSTPDHKFDYSVSMTTRPQRPGEVTJGVDYFFRTRE 60 
Sbjct: 

Query: 61 EFEALIKEGQMLEYAEYVGNYYGTPLSYVNETLDKGIDVFDEIEVQGALQVKSKVPDGVF 120 

EFE LIK GQMLEYAEYVGNYYGTPL+YVNETLDKGIDVFLEIEVQGALQVKSKVPDGVF 
Sbjct: 61 EFEELIKTGQMLEYAEYVGNYYGTPLTYVNETLDKGIDVFLEIEVQGALQVKSKVPDGVF 120 

Query: 121 IFLTPPDLEELEERLVGRGTDSPEVIAQRIERAKEEIALMREYDYAVVNDQVSIAAERVK 180 

+FLTPPDL+ELE+RLVGRGTDS EVIAQR1ERAKEEIALMREYDYAWND+V+LAAERVK 
Sbjct: 121 VFLTPPDLDELEDRLVGRGTDSQEVIAQRIERAKEEIALMREYDYAVVNDEVALAAERVK 180 

Query: 181 RVIEAEHYRVDRVIGRYTNMVKET 204 

R+IE EH+RV+RVIGRY M+K T 
Sbjct: 181 RIIETEHFRVERVTGRYDKMIKIT 204 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 982 

A DNA sequence (GBSxl041) was identified in S.agalactiae <SEQ ID 3003> which encodes the amino 
acid sequence <SEQ ID 3004>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1763 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3005> which encodes the amino acid 
sequence <SEQ ID 3006>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1551 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 {Hot Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 95/105 (90%) , Positives = 100/105 (94%) , Gaps = 1/105 (0%) 

Query: 1 MMLKPSIDTLLDKVPSKYSLVILQAKRAHELEAGEKATQDFKSVKSTLRALEEIESGNW 60 
10 MMIiKPSIDTIjLDKVPSKYSLVIIiQRKRAHELEAG tq+fksvkstl+aleeiesgnvv 

Sbjct: 1 MMLKPSIDTLLDKVPSKYSLVILQAKEAHELEAGATPTQEFKSVKSTLQALEEIESGHW 60 

Query: 61 IHPDPSAKRASVRARIEAERLAKEEEERKIKEQIAKEK-EDGEKI 104 
IHPDPSAKR +VRA+IEAERI1AKEEEERKIKEQIAKEK E+GEKI 
15 Sbjct: 61 IHPDPSAKREAVRAKIEAERLAKEEEERKIKEQIAKEKEEEGEKI 105 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 983 

20 A DNA sequence (GBSxl043) was identified in S.agalactiae <SEQ ID 3007> which encodes the amino 
acid sequence <SEQ ID 3008>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3413 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty= 0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13444 GB:Z99112 primosomal replication factor Y (primosomal 
protein N 1 ) [Bacillus subtilis] 
Identities = 377/807 (46%) , Positives = 529/807 (64%) , Gaps = 21/807 (2%) 

Query: 6 AQVIVDIPLMQTDKPFSYAIPKDLEDLVQVGVRVHVPFGRGKRLLQGFWGFRDDDELET 65 

A+VIVD+ D+PF Y IP L+ +++ G+RV VPFG R +QGFV ++ +L 

Sbjct: 4 AEVIVDVSTKNIDRPFDYKIPDHLKGMIKTGMRVIVPFGP- -RKIQGFVTAVKEASDIjSG 61 

Query: 66 KDIAEV---LDFEPVLNQEQLDLADQMRHrVFSYKISILKSMLPSLIaNSQYDKLLL---A 119 

K + EV LD PVL +E + L4 + S+KI+ L++MLP+ L ++Y+K L 

Sbjct: 62 KSVKEVEDLLDLTPVLTEELMILSSWLSDKTLSFKITALQAMLPAALKAKYEKELKIAHG 121 

Query: 120 TDTLPSEDREDLFGHKTEIVFSSLSSQDAKKA-GRLIQKGFIEVQYLAKDKKTIKTEKIY 178 

D P +R LF +++S + + K R +QKG I+V Y K K + 

Sbjct: 122 ADLPPQVER--LFSETKTLLYSDIPDHETI J KI J IQRHVQKGDIDVTYKVAQKTNKKMVRHI 179 

Query: 179 KINRTLLEKSQ IAARAKKRLELKEFLLENPQPGRLTAM KQFS S PWNFFRE 230 

+ N + E ++ ++ +A K+ + FL+ P+ ++ A SS + + 

Sbjct: 180 QANASKEELAKQAEGLSRQAAKQQAILHFLISEPEGVKIPAAELCKKTDTSSATIKTLIQ 239 

Query: 231 EGIIEVIEKFASRSDOTFKGILKTDFLDI^QEQAKVVKIvVDQIGKEQNKPFLLEGITGS 290 

+G+++ +E R K KT+ L L EQ + + + + +++K FLL G+TGS 

Sbjct: 240 KGLLKESYEEVYRDPYQDKMFKKTEPLPLTDEQRAAFEPIRETLDSDEHKVFLLHGVTGS 299 

Query: 291 GKTEVYLHIIDMVLKLGKTAIVLVPEISLTPQMTNRFISRFGKQVAIMHSGLSEGEKFDE 350 

GKTE+YL 1+ VL GK AIVLVPEISLTPQM NRF RFG QVA+MHSGLS GEK+DE 
Sbjct: 300 GKTEIYLQSIEICVI J AKGKEAIVLVPEISLrFO^™ FKGRFGS Q VAT ^ SGLS ' I ' GEKYDE 359 

Query: 351 TOKIKSGQAKVWGARSAIFAPLENIGAIIIDEEHESTYKQESNPRYHARDVALLRAEYY 410 

WRKI + ++WGARSAIFAP EN+G IIIDEEHES+YKQE PRYHA++VA+ RAE++ 
Sbjct: 360 WRKI HRKEVRLWGARSAI FAPFENLGMI I IDEEHESSYKQEEMPRYHAKEVAI KRAEHH 419 
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Query: 411 KAVLLMGSATPSIESRARASRDVYKFLELKHPANPKARIPQVEIIDFPJflFIGQQEVSNFT 470 

+++GSATP++ES ARA + VY+ L LKHR N + +P+V ++D R + S F+ 

Sbjct: 420 SCPVVLGSATPTriESYARAQKGVYELLSLKHRVNHRV-MPEVSIiVDMREELRIJGKRSMFS 478 

Query: 471 SYLLDKIRDRLDKKEQWLMLNRRGYSSFIMCRDCX3YVDQCPNCDISLTLHMATKTMNCH 530 

L++K+ + + K EQ VL LN+RGYSSF+MCRDCGYV QCP+CDIS+T H + + CH 
Sbjct: 479 VELMEKLEETIAKGEQAVLFLNKRGYSSFVMCRDOSYVPQCPHCDISMTYHRYGQRLKCH 538 

Query: 531 YCGFEKPIPRTCPNCNSKSISYYGTGTQKAYEELLKVIPDAKIIjRMDVDTTRQKGGHESI 590 

YCG E+P+P TCP C S+ I ++GTGTQ+ EEL KV+P A+++RMDVDTT +KG HE + 
Sbjct: 539 YCGHEEPVPHTCPECASEHIRFFGTGTQRVEEELTKVLPSARVIRMDVDTTSRKGAHEKL 598 

Query: 591 LKRFGNHEADILLGTQMIAKGLDFPNVTLVGVIJ>IADTSmLPDFRSSERTFQLLTQVAGR 650 

L FG +ADILLGTQMIAKGLDFPNVTLVGVL4-ADT+L++PDFRS+E+TFQLLTQV+GR 
Sbjct: 599 LSAFGEGKADILLGTQM1AI<;GLDFPNVTLVGVL3ADTTLHIPDFRSAEKTFQLLTQVSGR 658 

Query: 651 AGRAEKEGEWIQTYNPNHYAIQLAQKQDFEAFYQYEMNIRRQLGYPPYYFTVGLTLSHK 710 

AGR EK G V+IQTY P+HY+IQL + D-E FYQ+EM RR+ YPPYY+ +T+SH+ 
Sbjct: 659 AGRHEKPGHVIIQTYTPSHYSIQLTKTHDYETFYQHEMAHRREQSYPPYYYLALVTVSHE 718 

Query: 711 DEEMLIRKSYEVLSLLKQGFSDKVKLLGPTPKPIARTHNLYHYQIIIKYRFEDNLELVLN 770 

+ + ++ LK K+LGP+ PIAR + Y YQ +IKY+ E L +L 

Sbjct: 719 EVAKAAVTAEKIAHFLKANCGADTKILGPSASPIARIKDRYRYQCVIKYKQETQLSALLK 778 

Query: 771 RLLD-MTQDKENRDLRLAIDHEPQNMM 796 

++L+ ++ E + + ++ID P MM 
Sbjct: 779 KILEHYKREIEQKHVMISIDMNPYMMM 805 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3009> which encodes the amino acid 
sequence <SEQ ID 3010>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1396 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 556/793 (70%), Positives = 659/793 (82%), Gaps = 1/793 (0%) 

Query: 4 KLAQVIVDIPLMQTDKPFSYAIPKDLEDLVQVGVRYHVPFGRGNRLLQGFWGFRDDDEL 63 

K+A VI VDI PLMQTDKPFSY IPK+L LVQ+G RVHVPFG+GNRLLQGF++GF +D 
Sbjct: 12 KVAHVIVDIPLMQTDKPFSYGIPKELVSLVQLGSRVHVPFGKGNRLLQGFIIGFGQEDSS 71 

Query: 64 ETKDIAEVLDFEPVLNQEQLDIADQMRHTVFSYKISILKSMLPSLIjNSQYDKLLLATDTL 123 

K I VLD EPVLNQEQL LADQ+R TVFSYKI++LK+M+P+LLNS YDK+L L 
Sbjct: 72 SLKLIQTVLDPEPVIiNQEQLTILADQLRKTVFSYKITLLKAI^IPNIjLNSNYDKVLRPESGL 131 

Query: 124 PSEDREDLFGHKTEIVFSSLSSQDAKKAGRilQKGFIEVQYLAKDKKTIKTEKIYKINRT 183 

DR+ LF K +++S+L + KA+IQGIV YLAKDKK +KTEK Y ++ 
Sbjct: 132 KKSDRDFLFEGKPSVLYSTLDREKEKIALKGIQAGHITVSYIAKDKKNLKTEKYYHVDLD 191 

Query: 184 LLEKSQIAARAKKRLELKEFLLENPQPGRLTALNKQFSSPWNFFREEGIIEVIEKEASR 243 

L I++RAKKR LK++LL + + +L L + FS W +F +1 + E+ R 
Sbjct: 192 AIjAVHPISSRAKKRQLLKDYI^THTKEAKIATLYQAFSRDWAYFVTNHLIRIDERPIDR 251 

Query: 244 SDNYFKGILKTDFLDIiNQEQAKVVKIVVDQIGKEQNKPFLLEGITGSGKTEVYLHIIDNV 303 

S++YF I + FL LN++QR V +V+QIGK +KPFL+EGITGSGKTEVYLHI I + V 
Sbjct: 252 SESYFDQIKPSSFLTLNEQQASAVTEIVEQIGKP-SKPFLIEGITGSGKTEVYLHIIEAV 310 

Query: 304 LKLGKTAIvLVPEISLTPQMTNRFISRFGXQVAIMHSGLSEGEKFDEWRKIKSGQAKVW 363 

LK KTAIVLVPEISLTPQMT+RFISRFGKQVAIKHSGLS+GEKFDEWRKIK+GQAKWV 
Sbjct: 311 LKQDKTAIVLVPEISLTPQMTSRFISRFGKQVAIMHSGLSDGEKFDEWRKIKTGQAKVW 370 
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Query: 364 GARSAIFAPLENIGAIIIDEEHESTYKQESNPRYHARDVALLRAEYYKAVLLMGSATPSI 423 

GARSAIF+PLE IGAIIIDEEHESTYKQESNPRYHARWALLRA++++AV++MGSATPSI 
Sbjct: 371 GARSAIFSPLERIGAI I IDEEHESTYKQESNPRYHAREVALLRAKHHQAWVMGSATPSI 430 

Query: 424 ESRARASRDVYKFLELKHRANPKARI PQVEI IDFRNFIGQQEVSNFTSYLLDKIRDRLDK 483 

ESRARAS+ VY F++L RANP A+IP+V I+DFR++IGQQ VSNFT YL+DKI++RL K 
Sbjct: 431 ESRARASKGVYHFIQLTQRANPLAKIPEVTIVDFRDYIGQQAVSNFTPYL1DKIKERLVK 490 

Query: 484 KEQ WLMIJSIRRGYS S F IMCRDCGYVDQCPNCDI SLTLHMATKTI'MCHYCGFEKP I PRTCP 543 

KEQWLMLNRRGYSSF+MCRDCGYVD+CPNCDISLTLHM TKTMNCHYCGF+KPIP TCP 
Sbjct: 491 KEQVVLMLNRRGYSSFVMCRDCGYVDKCPNCDISLTLH^©TKTMNCHYCGFQKPIPITCP 550 

Query: 544 NCNSKS I S YYGTGTQKAYEELLKVI PDAKI LRMDVDTTRQKGGHES I LKRFGNHEAD I LL 603 

C+S SI YYGTGTQKA++EL VIP+AKILRMDVDTTR+K H++IL FG EADILL 
Sbjct: 551 ECHSNSIRYYGTGTQKAFDELQGVIPEAKILRMDVETTRMCRSHKTILDSFGRQEADILL 610 

Query: 604 GTQMIAKGLDFPNVTLVGVLNADTSLNLPDFRSSERTFQLLTQVAGRAGRAEKEGEWIQ 663 

GTQMIAKGLDFPNVTLVGVLNADTSLHLPDFR+SE+TFQLLTQVAGRAGRA K GEV+IQ 
Sbjct: 611 GTQMIAKGLDFPNVTLVGVLNADTSLNLPDFRASEKTFQLLTQVAGRAGRAHKPGEVLIQ 670 

Query: 664 TYNPNHYAIQLAQKQDFEAFYQYEMNIRRQLGYPPYYFTVGLTLSHKDEEWLIRKSYEVL 723 

TYNP+HYAIQLA+KQDFEAFY+YEM+IR Q+ YPPYYFTVG+TLSH+ E +++K+Y+V 
Sbjct: 671 TYWPDHYAIQLAKKQDFFAFYRYEMSIRHQMAYPPYYFTVGITLSHRLEASVVKKAYQVT 730 

Query: 724 SLLKQGFSDKVKLLGPTPECPIARTHNLYHYQIIIKYRFEDNLELVLNRLLDMTQDKENRD 783 

LLK SD +K+LGPTPKPIARTHNLYHYQI++KYRFEDNLE LNR+LD +Q+ +NR 
Sbjct: 731 ELLKSHLSDNIKILGPTPKPIARTHNLYHYQILLKYRFEDNLEET1MRILDWSQEADNRH 790 

Query: 784 LRLAIDHEPQNMM 796 

L+L ID EPQ + 
Sbjct: 791 LKLI IDCEPQQFL 803 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
s or diagnostics. 



Example 984 

A DNA sequence (GBSxl044) was identified in S.agalactiae <SEQ ID 301 1> which encodes the amino 
acid sequence <SEQ ID 3012>. This protein is predicted to be methionyl-tRNA formyltransferase (fmt). 
Analysis of this protein sequence reveals the following: 

40 Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1329 (Affirmative) < suco 

45 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13446 GB:Z99112 methionyl-tRNA formyltransferase [Bacillus subtilis] 
Identities = 155/314 (49%) , Positives = 221/314 (70%) , Gaps = 7/314 (2%) 

MTKLLFMGTPDFSATVLKGIIAIX3KYDvT^VOTQ 60 
MT+++FMGTPDFS VL+ ++ DG Y+V+ WTQPDR GRKK + PVKE AL + IP 
MTRIVFMGTPDFSVPVLRTLIEDG- YEWGWTQPDRPKGRKIOTLTPPPVKEEALRHGIP 5 9 



Query: 
Sbjct: 

Sbjct: 
Query: 



61 VYQPEKLSGSPELEQLMTLGADGIVTA^FGQFLPTKLLESVGFA-INvHASLIjPKYRGGA 119 

V QPEK+ + E+E+++ L D IVTAAFGQ LP +LL+S + INVHASLLP+ RGGA 
60 VLQPEKWLTEEIEKOTiALKPDLIWAAFGQILPKELLDSPKYGCINVHASLLPELRGGA 119 
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Sbjct: 120 PIHYSILQCSKKKTGITIMm^KLDAGDMISIWEVDIEETDNVGTLHDKLSVAGAICDLSE 179 

Query: 180 TLPGYLSGDIKPIPQNEEEVSFSPNISPDEERIDWKSSRDIFNHTOGMYPWPVAHTLLE 239 

T+P ++G I P Q+EE+ +++PNI ++E +DW+++ +++N +RG+ PWPVA+T L 
Sbjct: 180 TVPNVIAGSISPEKQDEEKATYAPNIKREQELLDWSRTGEELYNQIRGIjHPWPVAYTTLN 239 

Query: 240 GmFIOLY--EVTMSEGKGSPGQVIAKTKNSLTVATG-DGAIELKSVQPAGKPRMDIKDFL 296 

G K++ + + PG V+A K + VATG + A+ L +QPAGK RM +DF+ 

Sbjct: 240 GQNLKIWASKKIAAPTTAEPGTWAVEKEGIIVATGNETALLLTELQPAGKKRMKGEDFV 299 

Query: 297 NGVGRNLEIGDKFG 310 

G ++E GD G 
Sbjct: 300 RGA- -HVFAGDVLG 311 

A related DNA sequence was identified in S.pyogenes <SEQ ID 301 3> which encodes the amino acid 
sequence <SEQ ID 3014>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 0730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 217/310 (70%) , Positives - 266/310 (85%) 

MTKLLFMGTPDFSATVLKGnMGKYTWIAV^ 60 
M KLLFMGTP FSATVLKG+L + Y++L WTQPDRAVGRKK+ 1 K+TPVK+ +ALE+ I 
MIIOjLFMGTPQFSATVIjKGLLDNPAYEILGVWQPDRAVGRKKDIKVTPVKQLALEHGIS 6 0 



+YQPEKLSGS EL +-1-M LGADGI +TAAFGQFLPT LL+SV FAINVHASLLPKYRGGAP 



LP YLSG++KPIPQ+ + +FSPNISP+ E++DW S++++FNH+RGM PWPVAHT LEG 



R K+YE ++EG+G PGQV+ KTK SL +ATG GA+ L VQPAGKP+M I DFLNG+G 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 985 

A DNA sequence (GBSxl045) was identified in S.agalactiae <SEQ ID 3015> which encodes the amino 
acid sequence <SEQ ID 3016>. This protein is predicted to be sunL protein (sun). Analysis of this protein 
sequence reveals the following: 

d N-terminal signal sequence 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 




Query: 


181 


Sbjct: 


181 




241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 
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Final Results 

bacterial cytoplasm Certainty=0 . 1677 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10711 GB:AJ132604 sunL protein [Lactococcus lactis] 
Identities = 222/434 (51%), Positives = 305/434 (70%), Gaps = 15/434 (3%) 

KSARGLaLMTLEEVFDKGAYSNIAtNKSLKKSRLSDKDRALVTEI VYGTVARKITLEWYL 6 6 
K+AR Mi L ++F AY+NI+L+++L+ S LS D+ VT +VYG V++K LEWY+ 
KNARQTALDVLNDIFGNDAYANISLDRNLRDSELSTVDKGFOTALVYGWSKKALLEWYI 62 



Query: 


7 


Sb j ct : 


3 


Query: 


67 


Sbjct: 


63 


Query: 


127 


Sbjct: 


120 


Query: 


186 


Sbjct: 


174 


Query: 


246 


Sbjct: 


229 


Query: 


306 


Sbjct: 


289 




366 


Sbjct: 


349 






Sbjct: 


409 



y G + + VLDAC+APGGK+ +H+A YLTTG +TALDLY+HKL+L+ +NA+R ++DKI T+ 



++L+K GI+ YSTCTIF+EENF V+ •I-FLENHPNFEQVE+S+ + +++K GC+ I+PE 



A related DNA sequence was identified in S.pyogenes <SEQ ID 301 7> which encodes die amino acid 
sequence <SEQ ID 301 8>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

= 13/433 (3%) 

Query: 7 KSTRGKALLVIEAIFDQGAYTNIAI^C^LSNKALSAKDPjyijLTEIvYGTVSRKISIiEWYL 66 

K+ R AL V+ IF AY NI+L++ L + LS D+ +T +VYG VS+K LEWY+ 
Sbjct: 3 KNARQTALDVIiNDIFGNDAYANISLDRNLRDSELST^/DKGFOTALWGWSKKALLEWYI 62 

Query: 67 AHYVKDRDKLDKWVYYLLMLSLYQLTYLDKLPAHAIVITOAVGIAKNRGNKKGAEKFVNAI 126 

+K K W LL+L++YQ+ ++DK+P A V++AV IAK R + + F+NA+ 
Sbjct: 63 TPLLKKEPK- -PWAFMLLLLTIYQ^/LFMDKVPISAAVTDEAVKIAK-RHDGQATANPINAV 119 
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Query: 127 LRQFTSHPLPDMETIKRRNKYYSVKYSLPWJLVKKLEDQFGSDRSVAIMESLFVRSKASI 186 

LR F E K + KYS+P L+ K+ QFG R+ I+ESL S S+ 

Sbjct: 120 LRNFMRS EHRNEEPKDWETKYSMPKLLLDKMVRQFGGKRTGEILESLEKPSHVSL 174 

Query: 187 RVTDPLKLEEVAEALDAERSLLSATGLTKASGHFAASDYFTNGDITIQDESSQLVAPTIiN 246 

R DP E SLL+ T L SG+F+ ++ F G ITIQDE+SQLVAP L 

Sbjct: 175 RKIDP TVEIAGTRPSLLTETALIADSGNFSITEEFQTGRITIQDETSQLVAPQLE 229 

Query: 247 IDGDDIILDACSAPGGKTSHIASYLKTGIWIALDLYDHKLELVKENANRLGVADWIETRK 306 

++G + +LDAC+APGGK+ +H+A YL TG + ALDLY+HKL+L+ +NA R VAD I T+K 
Sbjct: 230 LEGTEEVLDACAAPGGKSTHMAQYLTTGHITALDLYEHKLDLINQNAQRQHVADKITTQK 289 

Query: 307 LDAREVHRHFEKDSFDKILVDAPCSGIGLIRRKPDIKYNKESQGFNALQAIQLEILSSVC 366 

DA ++ +F + FD+ILVDAPCSGIGMRRKPDI+Y KES F LQ IQLEIL+S 
Sbjct: 290 ADATMIYENFGPEKFDRILVDAPCSGIGLIRRKPDIRYRKESSDFIDLQKIQLE'ILNSAS 349 



Sbjct: 350 KSLKKSGIMVYSTCTIFDEENFDWEEFLENHPNFSQVEISNEKPEVIKEGCLFITPEMY 409 

20 

Query: 427 QTDGFFIGQVRRV 439 

TDGFFI + +++ 
Sbjct: 410 HTDGFFIAKFKKI 422 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/440 (69%) , Positives = 370/440 (83%) 
30 Sbjct: 

Query: 61 TLEWYLSHFIvDRDKLEU'JVYHLLLLSLYQLLYLDNIPDHAIA^AVTIAKNRGNKKGAE 120 

+LEWYL+H++ DRDKL+ NVY+LL+LSLYQL YLD +P HAIVNDAV IAKNRGNKKGAE 
Sbjct: 61 SLEWYLAHWKDRDKLDraWYYLLMLSLYQLTYLDKLPAHAIVKDAVGIAKNRGNKKGAE 120 

35 

Query: 121 KLINAVLRRVSSETLPEIASIKRQNKRYSVAYSMPVWLVKKtlDQYGETRALAIMESLFE 180 

K +NA+LR+ +S LP++ +IKR+NK YSV YS+PVWLVKKL DQ+G R++AIMESLF 
Sbjct: 121 KFVNAILRQFTSHPLPDMETIKRRNKYYSVKYSLPVWLVKKLEDQFGSDRSVA1MESLFV 180 

40 Query: 181 RNKASLRVTDLSQKQTIKETLNVRDSHIAETALVADSGNFASTSFFQDGLITIQDESSQL 240 

R+KAS+RVTD + + + EL+ S++TIi SG+FA++ +F +G ITIQDESSQL 
Sbjct: 181 RSKASIRVTDPLKLEEVAEALDAERSLLSATGLTKAS3HFAASDYFTNGDITIQDESSQL 240 

Query: 241 VAPTLKVSGNDQVLDACSAPGGKTSHIASYLTTGAVTALDLYDHKLELVMENAKRLGLSD 300 
45 VAPTL + G+D +LDACSAPGGKTSHIASYL TG V ALDLYDHKLELV ENA RLG++D 

Sbjct: 241 VAPTLNIDGDDIILDACSAPGGKTSHIASYLKTGKVIALDLYDHKLELVKENANRLGVAD 300 

Query: 3 01 KIKTKKLDASKAHEYFLEDTFDKILVDAPCSGIGLIRRKPDIKYNKANQDFEALQEIQLS 360 
I+T+KLDA + H +F +D+FDKILVDAPCSGIGLIRRKPDIKYHK +Q F ALQ IQL 
50 Sbjct: 301 NIETRKLDAREVHRHFEKDSFDKILVDAPCSGIGLIRRKPDIKYNKESQGFNALQAIQLE 360 

Query: 361 ILSSVCQTLRKGGIITYSTCTIFEEENFQVIEKFLENHPNFEQVELSHTQEDIVKRGCIS 420 

ILSSVCQTLRKGGI ITYSTCTIF+EEN QVIE FL++HPNFEQV+L+HTQ DIVK G + 
Sbjct: 361 ILSSVCQTLRKGG1ITYSTCTIFDEENRQV-EAFLQSHPNFEQVKLNHTQADIVKDGYLI 420 

55 

Query: 421 ISPEQYHTDGFFIGQVKRIL 440 

I+PEQY TDGFFIGQV+R+Ii 
Sbjct: 421 ITPEQYQTDGFFIGQVRRVL 440 

60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1088- 

Example 986 

A DNA sequence (GBSxl046) was identified in S.agalactiae <SEQ ID 3019> which encodes the amino 
acid sequence <SEQ ID 3020>. This protein is predicted to be pppL protein. Analysis of this protein 
sequence reveals the following: 

>l-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5796 (Affirmative) < succ 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 ME1SLLTDIGQRHSNNQDFINQFENJOVGVPLIILAEGMGGHEAGNIASEMTVTDLGSDWA 60 

ME S+L+DIG +RS NQD++ + N+AG L +LADGMGGH+AGN+AS++TV DLG W+ 
Sbjct: 1 MEYSILSDIGSKESTNQDWGTYVNRAGYQLFLLAreMGGHKAGNVASKXTVEDLGKLWS 60 

Query: 61 ETDF SELSEIRDWMLVSIETENRKIYELGQSDCYKGMGTTIEAVAIVGDNIIFAHVG 117 

ET F + + + W+ + EN I LG+ D+Y+GMGTT+EA+ I G+ 1+ AHVG 
Sbjct: 61 ETFFDAGTPEATLEIWLRNQVRNENENIASLGICLDEYQGMGTTLEALVIKGNTIVSAHVG 120 

Query: 118 DSRIGIVRQGEYHLLTSOHSLVNELVKAGQr.TEEEAASHPQKNIITQSIGQANPVEPDLG 177 

DSR ++R GE + +T+DHSIiV ELV AGQ+TEEEA HP KNIIT+S+GQ N V+ D+ 
Sbjct: 121 DSRTYLMRDGEUSIKITTDHSLVQELVDAGQITEEEAEVHPNKNIITRSLGQTNEVQADIQ 180 

Query: 178 VHLLEEGDYLVVNSDGLTIMLSNADIATVLTQEK-TLDDKNQDLITIANHRGGIjDNITVA 236 

L+ GD +++NSDGLTNM+S +1 VL +E TLD+K++ LI LAN GGLDNITV 
Sbjct: 181 ALELC^GDIILMNSDGLTNMVSITEIMEVLEREDLTLDNKSEALIRLANEHGGLDNITW 240 

Query: 237 LVYVE 241 

L+ E 
Sbjct: 241 LIKFE 245 

A related DNA sequence was identified in S.pyogems <SEQ ID 302 1> which encodes the amino acid 
sequence <SEQ ID 3022>. Analysis of this protein sequence reveals the following: 
Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .5301 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 180/245 (73%) , Positives = 220/245 (89%) 

Query: 1 MEISLLTDIGQRRSNNQDFINQFENKAGVPLIILADGMGGHRAGNIASEMTVTDLGSDWA 60 

M+ISL TD IGQ+RSNNQDF IN+F+NK G+ L+ILZVDGMGGHRAGNIASEMTVTDLG +W 
Sbjct: 1 MKISLKTDIGQKRSNNQDFINKFDNKKGITLVILADGMGGHRAGNIASEMTVTDLGREVJV 60 

Query: 61 ETDFSELSEIRDWMLVSIETENRKIYELGQSDDYKGKGTTIEAVAIVGDNIIFAHVGDSR 120 

+TDF+ELS+IRDW+ + I++EN++IY+LGQS+D+KGMGTT+EAVA+V + I+AH+GDSR 
Sbjct: 61 KTDFTELSQIRDWLFETIQSENQRIYDLGQSEDFKGMGTT\rEAVALVESSAIYAHIGDSR 120 

Query: 121 IGIVRQGEYHLLTSDHSLVNELVKAGQLTEE3ARSHPQKNI ITQS IGQANPVEPDLGVHL 180 

IG+V G Y LLTSDHSLVNELVKAGQ+TEEEAaSHPQ+NIITQSIGQA+PVEPDLGV + 
Sbjct: 121 IGLVHDGHYTLLTSDHSLVNELVKAGQITEEEAZ\SHPQRNIITQSIGQASPVEPDLGVRV 180 
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Query: 181 LEEGDYLWNSDGLTIMLSNADIATVTjTQEKTLDDKKQDLITLMSIHRGGLDNITVALVYV 240 

LE GDYLV+NSDGLTNM+SN +1 T+L + +LD+KNQ++I LAN RGGLDNI T+ALV+ 
Sbjct: 181 LEPGDYLVINSDGLTNMISNDEIVTILGSKVSLDEKNQEMIDLANLRGGLDNITIALVHN 240 

5 Query: 241 ESEAV 245 

ESE V 
Sbjct: 241 ESEDV 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 987 

A DNA sequence (GBSxl047) was identified in S.agalactiae <SEQ ID 3023> which encodes the amino 
acid sequence <SEQ ID 3024>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
15 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.03 Transmembrane 34S - 362 ( 340 - 372) 

Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9539> which encodes amino acid sequence <SEQ ID 9540> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10713 GB:AJ132604 hypothetical protein [Lactococcus lactis] 
Identities = 219/380 (57%) , Positives - 284/380 (74%) , Gaps = 8/380 (2%) 







Sb j ct : 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sb j ct : 


181 




241 


Sbjct: 


241 


Query: 


300 


Sbjct: 


299 




360 


Sbj ct: 


355 



EA AMAEL+HPNIV I D+GE + QQ++VME+VDG LK+YI NAPL+N+E + 1+ E+ 



LSAM +AH GI+HRDLKPQN+L++ GTVKVTDFGIA A +ETSLTQTN+M GSVHYLS 



PEQARGS ATVQSDIYA+GI+LFE+LTG IP+DGDSAV IAL+HFQ+ +PSI+ N VP 



QALEN+VIKATAK + +RY EM D++T+ S R E KLVEN D + TK +P 



++ T+ L+ K+ 



+PT ++VP+V+N 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3025> which encodes the amino acid 
sequence <SEQ ID 3026>. Analysis of this protein sequence reveals the following: 
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Possible site: 56 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8. 60 Transmembrane 349 - 365 ( 340 - 370) 

5 Final Results 

bacterial membrane — Certainty=0. 4439 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 {Not Clear) < suco 

1 0 The protein has homology with the following sequences in the databases: 

>GP:CAA10713 GB:AJ132604 hypothetical protein [Lactococcus lactis] 
Identities = 209/378 (55%) , Positives = 273/378 (71%) , Gaps = 8/378 (2%) 

Query: 1 MIQIGKLFAGRYRILKSIGRGGMADWLANDLILDNEDVAIKVLRTNyQTDQVAvARFQR 60 
15 MIQIGK+FA RYRI+K IGRGGMA+VY D L + VAI KVLR+N+ + D +A+ARFQR 

Sbjct: 1 MIQIGKIFADRYRIIKEIGRGGMANVYQGEDTFLG3RKVAIKVLRSNFENDDIAIARFQR 60 

Query: 61 E^RAMAEI^PNIVAIRDIGEEDGQQFLVMEYVDGADLKRYIQNHAPLSNNEvVRIMEEV 120 
EA AMAEL+HPNIV I D+GE + QQ++VME+VDG LK+YI +APL+N+E + 1+ E+ 
20 Sbjct: 61 EAFAMAELSHPNIVGISDVGEFESQQYIVMEFVDGMTLKQYINQNAPLANDEAIEIITEI 120 

Query: 121 LSAMTLAHQKGIVHRDLKPQNILLTKEGWKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 180 

LSAM +AH GI+HRDLKPQN+L+4 G VKVTDFGIA A +ETSLTQTN+M GSVHYLS 
Sbjct: 121 LSAMDMAHSHGIIHPJDLKPQNVLVSSSGTVKOTDFGIAKALSETSLTQTNTMFGSVHYLS 180 

25 

Query: 181 PEQARGSKATIQSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSIIEENHNVP 240 

PEQARGS AT+QSDIYA+GI+LFE+LTG IP+DGDSAV IAL+HFQ+ +PSII N VP 
Sbjct: 181 PEQARGSNATVQSDIYAIGIILFELLTGQIPFDGDSAVAIALKHFQENIPSIINLNPEVP 240 

30 Query: 241 QALENWIRATAKiOiSDRYGSTFEMSRDIjMTALSYNRSRERKIIF-ENVESTKPLPKVAS 299 

QALENWI +ATAK +++RY EM D+ T+ S +R E K++F ++ + TK +P 
Sbjct: 241 CJALENWIKATAKDINNRYADVEEMMTDVATSTSLDRRGEEKIjVFNKDHDETKIMPANLI 300 

Query: 300 GPTASVKLSPPTPTVLTQESRLDQTNQTDALQPPTKKKKSGRFLGTLFKILFSFFIVGVA 359 
35 P + L QE +4+ T+ + KK K G + + +L ++G 

Sbjct: 301 NPYDTKPLIDKKTD - -DQEKAQSESSTTENNKNKNKKSKKGLI I SLWLLL VIGGG 354 

Query: 360 LFTYLILTKPTSVKVPNV 377 
F + + T PT+VKVPNV 
40 Sbjct: 355 AFAWAVST-PTNVKVPNV 371 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 390/643 (60%) , Positives = 480/643 (73%) , Gaps = 29/643 (4%) 

MIQIGKLFAGRYRILKSIGRGGMADWIARDLILDNEEVAIKVLRTNYQTDQIAVARFQR 60 
MIQIGKLFAGRYRILKS IGRGGMADVYLA DLILDNE+VAI KVLRTNYQTDQ+AVARFQR 
MIQIGKLFAGRYRILKSIGRGGKADVYIjANDLILDNEDVAIKVLRTNYQTDQVAVARFQR 60 



LSAM+LAHQKGIVHRDL.KPQNILLTK+G VKVTDFG I AVAFAETSLTQTNSMLGSVHYLS 



PEQARGSKAT+QSDIYAMGIMLFEMLTGHIPYD3DSAVTIALQHFQKPLPSI+ EN +VP 





1 


Sbjct: 


1. 




61 


Sbj Ct : 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbj ct : 


241 


Query: 


300 
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: 301 PTASVKLSPPTPTVLTQESRL DQTNQTDALQPPT KKKKSGRFLGTLFKI 349 



G W+TDP AG ++R+G+ + LY++ NK F 



Sbj ct : 


301 


Query: 


350 


Sbj Ct : 


350 


Query: 


410 


Sbj ct : 


410 


Query: 


469 


Sbjct: 


470 




529 


Sbj ct : 


530 


Query: 


586 


Sbjct: 


590 



++R + N+Y T+++QS G FNP+G K+TL+VAV+D + MP VT + 



T LG+DA 4- Y 



+T ++SSS+ SS SSS ++ +DS 4- ++ 



SEQ ID 3024 (GBS297) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 6; MW 75kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 27 (lane 4; MW 100.2kDa) and in 
Figure 159 (lane 2-4; MW lOOkDa). GBS297-GST was purified as shown in Figure 223, lane 3. GBS297- 
His was purified as shown in Figure 203, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 988 

A DNA sequence (GBSxl048) was identified in S.agalactiae <SEQ ID 3027> which encodes the amino 
acid sequence <SEQ ID 3028>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.91 Transmembrane 60 - 76 ( 50 - 90) 
INTEGRAL Likelihood = -7.43 Transmembrane 7 - 23 ( 3-25) 
INTEGRAL Likelihood = -5.68 Transmembrane 27 - 43 ( 24 - 46) 



Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03323 GB:AB035448 hypothetical protein [Staphylococcus 
aureus] 

Identities - 53/230 (23%), Positives = 104/230 (45%), Gaps = 14/230 (6%) 

Query: 5 QFFLLVEAWLVMGLMKILSDDWTSFIFILftL- -ILLALRF-YNNDSRHNFLLTTSLLLL 61 

Q ++ A++++ I + F+ +L L +L+ + + Y + R LL+ 

Sbjct: 9 QMLI IFTALMI IANFYYIFFEK- IGFLLVLLLGCVLVYVGYLYFHKIRGLLAFWIGALLI 67 

Query: 62 FLIFMIJIPY-IIAAWFAVLYVLIiraFSQVTaCKNRYALIQFKNHQLDVKTTRNQWLGTDQ 120 

+ N Y II VF +L ++ + K K A + +K +W G + 
Sbjct: 68 AFTLLSNKYTIIILFVFLLLLIVRYLIHKFKPKKVVATDEVMTSPSFIK QKWFGEQR 124 

Query: 121 HESDFYAFEDINIIRISGTDTIDLTNVIVSGQDNVIIIQKVFGDTKVLVPLDVAVKADIS 180 
Y +ED+ I G IDLT ++N I+++ + G +V++P++ + ++ 
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Sbjct: 125 TEVYVYK^DVQIQHGIGDLHIDLTKAaNIKEWTIVVRHILGKVQVILPWYNINLHVA 184 

Query: 181 SWGSVQYFDFEEYDLRKESIKLSQ--EEEYYLliKRVKLVVMTIAGK\7EV 228 

+ YGS Y++Y+N+I++ + + Y V + V+T G VEV 
Sbjct: 135 AFYGST-YWEKSYKVENNNIHTEEMMKPDNY TVNIYVSTFIGDVEV 230 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3029> which encodes the amino acid 
sequence <SEQ ID 3030>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.92 Transmembrane 44 - SO ( 36 - 64) 

INTEGRAL Likelihood = -8.76 Transmembrane 69 - 85 ( 66 - 105) 

INTEGRAL Likelihood = -8.70 Transmembrane 24 - 40 ( 20 - 42) 

INTEGRAL Likelihood = -6.64 Transmembrane 88 - 104 ( 85 - 105) 

Final Results 

bacterial membrane Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB03323 GB:AB035448 hypothetical protein [Staphylococcus 
aureus] 

Identities = 41/187 (21%), Positives = 85/187 (44%), Gaps = 22/187 (11%) 

FILILVh- -ILIALRF-YNQDSRNNFTjLTVSLLFLFLIFMLNPYIIMAVLLGIVYIFINH 103 
F+L+L+L +L+++Y R +L+ +NYI+ ++++++ 

FLLVLLLGCVLVYVGYLYFHKIRGLLAFWIGALLIAFTLLSNKYTIIILFVFLLLLIV-- 90 

FSQVKKKNRFALIRFKEEKIEVNNT KHQWIGTANYESDYYCFDDINI IRI SG 155 

R+ + +FK +K+ + K +W G Y ++D+ I G 
RYLIHKFKPKKWATDE VMTS PS FI KQKWFGEQRTPVYVYKWEDVQIQHGIG 142 





47 




33 




104 


Sbjct: 


91 




156 


Sbjct: 


143 




216 


Sbjct: 


202 



+N IV+R I G +++P++ + L V++ YGS 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 137/211 (64%) , Positives = 175/211 (82%) 

Query: 1 MKKFQFFLLVFAVVLVMGLMKILSDDWTSFIFILALILIALRFYNNDSRHNFLLTTSLLL 60 

MKKFQFFLL+E ++L MG+M IL +D +SFI IL LILLALRFYN DSR+NFLLT SLL 
Sbjct: 18 MKKFQFFLLIECILLAMGIMTILDNDLSSFILILVLILLALRFYNQDSRNNFLLTVSLLF 77 

Query: 61 LFLIFMLNPYIIAAWFAVLYVLINHFSQVKKKWYALIQFKNHQLDVKTTRNQWLGTDQ 120 

LFLIFMLNPYII AV+ ++Y+ INHFSQVKKKNR+ALI+FK +++V T++QW+GT 
Sbjct: 78 LFLIFMLNPYIIMAVLLGIVYIFINHFSQVKKKNRFALIRFKEEKIEVNNTKHQWIGTAN 137 

Query: 121 HESDFYAFEDINI IRI SGTDT I DLTNVTVSGQDNVT I IQKVFGDTKVLVPLDVAVKAD I S 180 

+ESD+Y F+DINIIRISG DT+DLTNVIV+G DN+I+I+K+FG+T +LVP+DV V D+S 
Sbjct: 138 YESDYYCFDDINIIRISGHDTVDLTOTIVTGMDNII VIRKIFGNTTILVPIDVTVTLDVS 197 

Query: 181 SVYGSVQYFDFEEYDLRHESIKLSQEEEYYL 211 

S+YGSV +F ++YDLRNESIK + + L 
Sbjct: 198 S IYGS VDFFRCQQYDLRNES IKFKETDNQSL 228 

SEQ ID 3028 (GBS66) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 6 (lane 4; MW 25kDa) and in Figure 7 (lane 2; MW 24.7kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 989 

A DNA sequence (GBSxl049) was identified in S.agalactiae <SEQ ID 3031> which encodes the amino 
acid sequence <SEQ ID 3032>. This protein is predicted to be histidine kinase (narQ). Analysis of this 
protein sequence reveals the following: 

Possible site: 19 

»> Seems to have an uncleavable N-temi signal seq 

INTEGRAL Likelihood =-11.41 Transmembrane 47 - 63 ( 40 - 72) 
INTEGRAL Likelihood = -9.98 Transmembrane 9- 25 ( 5- 36) 

Final Results 

bacterial membrane Certainty=0. 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKKHHYFLAFFYGSVIIFAICFVIIDSLGVNL-VHLYQTSRLWLIEQLIFSIFFLSIAVT 59 

MKK Y + . + +F +++ L + + L+ + E+ +F + S+++T 

Sbjct: 1 MKKQAYVIIALTSFLFVFFFSHSLLEILDFDWSIFLHDVEKT---EKFVFLLLVFSMSMT 57 

Query: 60 ILLLLTOFLLDDNSKRQINHNLRRILNNQSINVTDDGTE1STNIQRLSKKMNIMTASLQS 119 

LL L W +++ S R++ NL+R+L Q + D ++ + + LS K+NL+T +LQ 
Sbjct: 58 CLIjALFWRGIEELSLRKMQANLKRLIAGQEWQVAD-PDLDASFKSLSGKLNLLTEALQK 116 

Query: 120 KENSRILKSQEIVKQERKRIARDLHDTVSQDLFAASMVLSGIAQNVSQLDVDQVGSQLLA 179 

EN + + +EI++4ERKRIARDLHDTVSQ+LFAA M+LSGI+Q +LD +++ +QL + 
Sbjct: 117 AENQSLAQEEEIIEKERKRIARDLHDTVSQELFAAHMILSGISQQALKLDREKMQTQLQS 176 

Query: 180 VEEMLQHAQNDLRILLLHLRPVELENKTLSEGFRMILKELTDKSDIEWYHESILTLPKK 239 

V +L+ AQ DLR+LLLHLRPVELE K+L EG +++LKEL DKSD+ V +++ LPKK 
Sbjct: 177 VTAILETAQKDLRVLLLHLRPVELEQKSLIEGIQILLKELEDKSDLRVSLKQNMTKLPKK 236 

Query: 240 IEDNIFRIGQEFISNTLKHSQASRLEVYLNQTENELQLKMIDNGIGFDMDSVYDLSYGLK 299 

IE++IFRI QE ISNTL+H+QAS L+VYL QT+ ELQLK++DNGIGF + S+ DLSYGL+ 
Sbjct: 237 IEEHIFRILQELISNTLRHAQASCLD\TLYQTDVELQLKWDNGIGFQLGSLDDLSYGLR 296 

Query: 300 NIEDRVEDLAGNLQLLSQPGKGVAMDIRLPLVNQ 333 

NI++RVED+AG +QLL+ P +G+A+DIR+PL+++ 
Sbjct: 297 NIKERVEDMAGTVQLLTAPKQGLAVDIRIPLLDK 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 299 1> which encodes the amino acid 
sequence <SEQ ID 2992>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.22 Transmembrane 49 - 65 ( 42 - 70) 
Likelihood = -6.58 Transmembrane 8 - 24 ( 5-33) 



- Final Results 

bacterial membrane Certainty=0. 6689 (Affirmative) . 

bacterial outside Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/337 (64%) , Positives = 276/337 (81%) , Gaps = 3/337 (0%) 
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Query: 1 MKIOfflYFIAFFYGSVIIFAICFVIIDSLGVNLYKLYQTSRLWLIEQLIFSIFFLSIjAvTI 60 

MKK +Y L + Y ++ I +1 FV++D+LG+ +L + LW +E+L FSI L ++VT+ 
Sbjct: 1 MKKRYYALWLYSTITILSIVFVVMDNLGITFNYL- -RNHLWQVERLGFSILLLIVSVTL 58 

Query: 61 LLLLTWFLLDDNSKRQINHNI^RIMJQSINVTDDGTEISTNIQRLSKKMNLMTASLQSK 120 

LLLL W +4DDNSKR IN NL+ ILNN+ + + D+ +EI+TN+ RLSKKM+ +TA++Q K 
Sbjct: 59 LLLLLWIIMDDNSKRNINQNLKYILTO^RIjYL-DETSEINTNLSRLSKKMSHLTANMQKK 117 

Query: 121 ENSRILKSQEIVKQERKRIARDLHDTVSQDLFAASMVLSGIAQNVSQLDVDQVGSQLEiAV 180 

E++ IL SQE+VKQERKRIARDLHDTVSQ+LFA+S++LSGI+ ++ QLD Q+ +QL V 
Sbjct: 118 ESAYILDSQEWKQERKRIARDLHDTVSQELFASSLILSGISMSLEQLDKTQLQTQLTTV 177 

Query: 181 EEMLQHAQNDLRILLLHLRPVELENKTLSEGFRMILKELTDKSDIEWYHESILTLPKKI 240 

E MLQ+AQNDLRILLLHLRP EL N+TLSEG MILKELTDKSDIEV+Y E+I LPK + 
Sbjct: 178 EAMLQNAQNDLRILLLHLRPTELANRTLSEGLHMILKELTDKSDIEVIYKETIAQLPICrM 237 

Query: 241 EDNIFRIGQEFISNTLICHSQASRLEVYLNQTENELQLKMIDNGIGFDMDSVYDLSYGLKN 300 

EDN+FRI QEFISNTLKH++ASR+EVYLNQT ELQLKMID+G+GFDMD V DLSYGLKN 
Sbjct: 238 EDNLFRIAQEFISNTLKHAKASRIEVYLNQTSTELQLKMIDDGVGFDMDQWDLSYGLKN 297 

Query: 301 IEDRVEDLAGNLQLLSQPGKGVAMDIRLPLVNQSEDK 337 

IEDRV DLAGNL L+SQ GKGV+MDIRLP+V +D+ 
Sbjct: 298 IEDRVNDLAGNLHLISQKGKGVSMDIRLPIVKGDDDE 334 

A related GBS gene <SEQ ID 8701> and protein <SEQ ID 8702> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 14.69 
GvH: Signal Score (-7.5): -4.31 

Possible site: 19 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -11.41 threshold: 0.0 

INTEGRAL Likelihood =-11.41 Transmembrane 47 - 63 ( 40 - 72) 

INTEGRAL Likelihood = -9.98 

PERIPHERAL Likelihood = 3.61 
modified ALOM score: 2.78 



40 

Final Results 

bacterial membrane Certainty=0 . 5554 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the databases: 

52.5/77.6% over 288aa 

Streptococcus pneumoniae 

GP| 5830526 | histidine kinase Insert characterized 

50 

ORF00320(433 - 1302 of 1617) 

GP|5830526|emb|CAB54570.l| |AJ006393 (43 - 331 of 331) histidine kinase {Streptococcus 
pneumoniae} 
%Match =28.6 
55 %Identity =52.4 %Similarity =77.6 

Matches = 152 Mismatches = 64 Conservative Sub.s = 73 



QEEEYTF*NVSN*L*TLSLES*G*S*MKKHHYFLAFFYGSVI IFAI CFVI IDSLGVNLVHLYQTSRLWLIEQLIFSIFFL 
= :| I ::: : |:::| :: : 

MKKQAWIIALTSFLFVFFFSHSLLEILDFDWSIFLHDVEKTEKFVFLLLVF 



492 522 552 582 612 642 672 702 

65 SIAVTILLLLTWFLLDDNSKRQINHNLRRILNNQSINVTDDGTEIS 
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|:::| || | | ::: | |:= ||:|:| | : | :: : : || |:||:| :|| || : : :||::: 

SMSMTCLI^FWRGIEELSLRKMQAHLKRLliaGQEWQVAD-PDLDASFKSLSGKI^IjLTEALQKAENQSLAQEEEIIEK 



erkriardlhdwsqdlfaasmvlsgiaqotscldvdqto^ 

IIIIIIII1II1I1HIII 1 = 1111 = 1 =11 = = = =11 =1 =1= II 111 = 11111111111 1 = 1 II = = 

ERKRIARDLHDWSQELFAAHMILSGISQQALI<LDREI<MQTQLQSVTAILETAQKDLRVLLLHLRPVELEQKSLIEGIQI 



=1111 1111= I === lllll|:=llll 11=11111=1=111 1=111 11= llll|::|||||| = 1= II 
LLKELEDKSDLRVSLKQNMTKLPKKIEEHIFRILQELISOTLRHAQASCLDWLYQTDVELQLKVVDNGIGFQLGSLDDL 



1111=11 =1111=11 =111= I =|=|=|||:||=== 
20 SYGLRNI KERVEDMAGTVQLLTAPKQGLAVD IRI PLLDKE 

310 320 330 

SEQ ID 8702 (GBS31) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 15 (lane 8; MW 64kDa). It was also expressed as GBS31d in E.coli as a GST- 
25 fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 151 (lane 8-10; MW 59kDa) and 
in Figure 187 (lane 8; MW 59kDa). GBS31d was also expressed in E.coli as a His-fusion product. SDS- 
PAGE analysis of total cell extract is shown in Figure 151 (lane 11-13; MW 34kDa) and in Figure 182 (lane 
11; MW 34kDa). Purified GBS31d-GST is shown in lane 3 of Figure 237. 

Example 990 

30 A DNA sequence (GBSxl050) was identified in S.agalactiae <SEQ ID 3033> which encodes the amino 
acid sequence <SEQ ID 3034>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 .2705 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 





8 


Sbjct: 


1 




68 


Sbjct: 


61 




128 


Sbjct: 


121 




188 


Sbjct: 





I KI VL VDDHEMVRLGLKSFLNLQADVEVIGEASNGLEGI KKALELRPDWVMDLVMPEMD 6 7 
+KI+LVDDHEMVRLGLKS+ +LQ DVEV+G3ASNG +GI ALELRPDV+VMD+VMPEM+ 
MKILLVDDHEMVRLGLKSYFDLQDDVEWGEASKGSQGIDLALELRPDVIVMDIVMPEMN 6 0 



G++ATLA+LK4WPEA IL++TSYLDNEKI PV++AGAKGYMLKTSSA E+L+A+ KV+ G 



LHE LTARERD+L L+AKGY+NQRIAD+LF1SI 



SNI LGKLNVADRTQA VVYAFQHHLVPQDD 216 
SNIL KL V+DRTQA VYAFQHHLV Q++ 
SNILAKLEVSDRTQAAVYAFQHHLVGQEE 209 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2995> which encodes the amino acid 
sequence <SEQ ID 2996>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3094 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 175/212 (82%) , Positives = 192/212 (90%) 

Query: 5 MDKIKIVLVDDHEMWLGLKSFLNLQADVEVIGEftSNGLEGIKKALELRPDWVMDLVMP 64 

M KIK++LVDDHEMW+GLKSFtNLQAD++V4GEASNG EG+ AL L+PDV+VMDLVMP 
Sbjct: 3 MSKI KVI LVDDHEMVRMGLKS FLNLQAD I D WGEASNGREGVDIiAIALKPD VL VMDL VMP 62 

Query: 65 ETOGVEATLALLKEWPFJU^ILVLTSYLDNEKIYPVIFAGAKGMLKTSSAAEILNAIRKU' 124 

E+ GVEATL +LK W EA +LVLTSYLDNEKIYPVI+AGAKGYMLKTSSAAEILNAIRKV 
Sbjct: 63 ELGGVFATLEVLKKWKE^KVLVLTSYLDNEKIYPVIDAGAKGYMLKTSSAAEILNAIRKV 122 

Query: 125 SRGEQAIENEVDKKIKAHDKCPALHEGLTARERDILNLLAKGYDNQRIADELFISLKTVK 184 

S+GE AIE EVDKKI KAHD+ P LHE LTARE DIL+LLAKGYDNQ IADELF I SLKTVK 
Sbjct: 123 SKGEIAIETEVDKKIKAHDQHPDI.HEELTAREYDILHLIAKGYDNQTIADELFISLKTVK 182 

Query: 185 THVSNI LGKLNVADRTQAWYAFQHHLVPQDD 216 

THVSNIL KL V DRTQAWYAF+HHLVPQDD 
Sbjct: 183 THVSNI LAKLEVGDRTQAWYAFRHHLVPQDD 214 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 991 

A DNA sequence (GBSxl051) was identified in S.agalactiae <SEQ ID 3035> which encodes the amino 
acid sequence <SEQ ID 3036>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1688 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB08166 GB:Z94864 putative peptidyl -prolyl cis-trans isomerase 
[Schizosaccharomyces pombe] 
Identities = 81/174 (46%) , Positives = 109/174 (62%) , Gaps = 30/174 (17%) 

Query: 288 IKTNHGDMTVKLFPDHAPKTVANFIGLAKQGYYDGIIFHRIIPDFMIQGGDPTGTGMGGE 347 

++T+ G + ++L+ +HAPKT NF LAK+GYYDG+IFHR+IPDF+IQGGDPTGTG GG 
Sbjct: 6 LQTSLGKILIELYTEHAPKTCQNFYTLAKEGYYDGVIFHRVIPDFVIQGGDPTGTGRGGT 65 

Query: 348 SIYGESFEDEFSEELYW-RGALSMANAGPNTNGSQFFIVQNTKIPYAKKELERGGWPTP 406 

SIYG+ F+DE +L++ G LSMANAGPNTN SQFFI T P 
Sbjct: 66 SIYGDKFDDEIHSDLHHTGAGILSMANAGPNTNSSQFFI TIAP 108 

Query: 407 IAELYAGQGGTPHLDRRHSVFGQLVDQSSFEVLDEIAAVETGSQDKPLEDWIL 460 

TP LD +H++FG++V S V + + T S D4P+E + 1+ 
Sbjct: 109 TPWLDGKHTIFGRW--SGLSVCKRMGLIRTDSSDRPIEPLKII 150 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3037> which encodes the amino acid 
sequence <SEQ ID 3038>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2175 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 381/464 (82%), Positives = 422/464 (90%) 

Query: 1 MDAKTKYKAKKIKAVFFDIDDTLRVKDTGYMPPSILKVFKALKDKGIWGIASGEARYGV 60 

MDAK KYKAKKIK VFFDIDDTLRVKDTGYMP SI +VFKALK KGI+VGIASGRARYGV 
Sbjct: 5 MDAKLKYKAKKIKMVFFDIDDTLRVKDTGY1-1PESIQRVFKALKAKGILVGIASGRARYGV 64 

Query: 61 PKEVQDLNADYC^nQ^GAWKDKDKNIIFHRPIPAEYVEQYKKWADTVGIKYGLAGRHEA 120 

P+EVQDL+ADYCTKLNGAYVKD K IIF PIPA+ V YKKWAD +GI YG+AGRHEA 
Sbjct: 65 PQEVQDLHADYCVKLNGAYVKDDAKTI IFQAPI PADVWAYKKWADDMGI FYGMAGRHEA 124 

Query: 121 VLSDPX1DLVNDAIDIVYSDLEVNPDFNKEHDIYQMWTFEDKGDSLHLPEPIAEHLRLIRW 180 

VLS R+D++++AID VY+ LEV PD+N+ HD+YQMWTFEDKGD L LP LAEHLRL+RW 
Sbjct: 125 VLSARNDMISNAIDNVYAQLEVCPDYNEYHDWQMWTFEDKGDGLQLPAELAEHLRLVRW 184 



241 AQKKADFITKKVEEDGILYALEELGLIEKELTFPQVDIENTEGPVAVIKTNHGDMTVKLF 300 

Q+KADFITKKVEEDGILYALEBLGLI+KEL FPQ+D+ N +GP A IKTNHGDMT+ LF 
245 LQEKADFITKKVEEDGILYALEELGLIDKELQFPQLDLPNHKGPKATIKTNHGDMTLVLF 304 

301 PDHAPKTVANFIGLAKQGYYDGI I FHRI I PDFMIQGGDPTGTGMGGES I YGESFEDEFSE 360 

PDHAPKTVANF+GLAK+GYYDGI I FHRI I P+FMIQGGDPTGTGM G+SIYGESFEDEFS+ 
305 PDHAPKTVANFLGLAKEGYYDGI I FHRI I PEFMIQGGDPTGTGMCGQS I YGESFEDEFSD 364 

361 ELYNTOGALSMANAGPNTNGSQFFIVQNTKIPYAKKELERGGWPTPIAELYAGQGGTPHL 420 

ELYN+RGALSMANAGPNTNGSQFFIVQN+KIPYAKKELERGGWP PIA YA GGTPHL 
365 ELYNLRGALSMANAGPNTNGSQFFIVQNSKIPYAKKELERGGWPAPIAASYAANGGTPHL 424 



Sbjct 

Query 
Sbjct 

Sbjct; 
Query: 
Sbjct: 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 992 

A DNA sequence (GBSxl052) was identified in S.agalactiae <SEQ ID 3039> which encodes the amino 
acid sequence <SEQ ID 304O. This protein is predicted to be ribosomal protein SI (rpsA). Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 3126 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07066 GB:AP001518 polyribonucleotide nucleotidyltransferase 
(general stress protein 13) [Bacillus halodurans] 
Identities = 46/120 (38%) , Positives = 71/120 (58%) , Gaps = 11/120 (9%) 

Query: 8 KIGDKLKGTVTGIRPYGAFVSLEDGRTGLIHISEIKTGYIDNIYDVLSVGDEVYVQVIDV 67 

++G ++G VTGI+P+GAFV+4+D + GL+HISE+ G++ +1 DVLSVGDEV V+++ V 
Sbjct: 5 EVGSIVEGKVTGIKPFGAFVA1DDQKQGLVHISEVAHGFVKDINDVLSVGDEVKVKILSV 64 

Query: 68 DEFTQKASLSLRTLEEERHHIQH RHRFSNNRLKIGFKPLEENLPSWVEE 116 

DE + K SLS+R +E R GF LE+ L W+++ 

Sbjct: 65 DEESGKISLSIRATQEAPERPARAPKPRPAGGGGRKPQKGQSQGQGFNTLEDKLKEWLKQ 124 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3041> which encodes the amino acid 
sequence <SEQ ID 3042>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1832 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . COQ0 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 78/115 (67%) , Positives = 100/115 (86%) 

Query: 7 MKIGDKLKGTVTGIRPYG3^SLEDGRTGLIHISEIKTGYIDNIYDVLSVGDEVYVQVID 66 

MKIGDKL GT+TGI+PYGAFV+LE+G TGLIHISEIKTG+ID+I +L++G++V VQVID 
Sbjct: 1 MKIGDKLHGTITGIKPYGAFVALENGTTGLIHISEIICrGFIDDIDQLIAIGNQVLVQVID 60 

Query: 67 VDEFTQKASLSLRTLEEEP^HIQHRHRFSNNRLKIGFKPLEENLPSWVEEGIAYL 121 

+DE+++K SLS+RTL EE+ H HRHR+SN+R KIGF+PLEE LP W+EE L +L 
Sbjct: 61 IDEYSKKPSLSMRTLAEEKQHFFHRHRYSNSRHKIGFRPLEEQLPQWIEESLQFIj 115 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 993 

A DNA sequence (GBSxl053) was identified in S.agalactiae <SEQ ID 3043> which encodes the amino 
acid sequence <SEQ ID 3044>. This protein is predicted to be pyruvate formate-lyase 2 activating enzyme 
40 (pflA). Analysis of this protein sequence reveals the following: 

Possible site: 41 

?>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 .2889 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 ? GP:AAC76934 GB:AE000469 probable pyruvate formate lyase activating 

enzyme 2 [Escherichia coli K12] 
Identities = 90/251 (35%) , Positives = 142/251 (55%) , Gaps = 16/251 (6%) 

Query: 8 VFNIQHFSIHDGPGIRTTVFLKGCP1RCPWCANPESQKMVPETMR 52 

55 +FNIQ +S++DG GIRT VF KGCP CPWCANPES +T+R 

Sbjct: 24 IFNIQRYSLNDGEGIRTWFFKGCPHLCPWCTiNPESISGKIQTVRREAKCLHCAKCLRDA 83 
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Query: 


53 


Sbjct: 


84 


Query: 


112 


Sbjct: 


144 


Query: 


172 


Sbjct: 


204 




232 


Sbjct: 


264 



L+IRQ+ LLPFHQ+G+ KY+LL 4 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3045> which encodes the amino acid 
sequence <SEQ ID 3046>. Analysis of this protein sequence reveals the following: 

T-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2209 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 187/255 (73%) , Positives = 220/255 (85%) 

EKGIVFNIQHFSIHDGPGIRTTWLKGCPLRCPWCANPESQKMVPETMRDAITNESVIVG 63 
++GIVFNIQHFSIHDGPGIRTTVFIjKGCPLRCPWCANPESQ+ PE M + + IVG 
DRGIVFNIQHFSIHDGPGIRTTVFLKGCPIiRCPWCANPESQQKAPEQMLTSDGLNTKIVG 62 



+ +HEQF+ L+ YVDFIYTDLKHYN L+HQ4- T V+N IIKNIHYAF GK IVLRIPV 



IP FNDSL+DA+ F+ LF++L+I QVQLLPFHQFG+NKY+LL R+YEM E+ A HPEDL 



DYQA+F +NIHCYF 
DYQAVFLNHNIHCYF 257 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 994 

A DNA sequence (GBSxl054) was identified in S.agalactiae <SEQ ID 3047> which encodes the amino 
acid sequence <SEQ ID 3048>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 1762 (Affirmative) < suco 
bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 







Sbjct: 








Sbj ct: 


63 




124 


Sbjct: 


123 




184 


Sbjct: 


183 




244 


Sbjct: 


243 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9299> which encodes amino acid sequence <SEQ ID 9300> 
was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC743S6 GB:AE000226 putative DEOR-type transcriptional 
regulator [Escherichia coli K12] 
Identities - 74/177 (41%) , Positives = 113/177 (63%) , Gaps = 1/177 (0%) 

10 Query: 2 1TOLEOTISLVSQYQKIDVNTLSELLQVSKUTIRKDLDKLEGKGLLHREHGYAVLNSGDDL 61 

+R + 1+ +V ++ V L++ VS+VTIR+DL+ IiE L R HG+AV DD+ 
Sbjct: 3 SRQQTILQ^W1DQGQVSVTDLAKATGVSEVTIRQDL1^^^LEKLSYLRRAHGFAVSLDSDDV 62 

Query: 62 NVRLSFlffiKTKKEIAAIAAI^SDNDTILIESGSTCALIAENICQTKRNVTILTNSCFIA 121 
15 R+ N+ K+E+A AA++V +TI IE+GS+ ALLA + + K+NVTI+T S +IA 

Sbjct: 63 ETIWMSNYTLKREIAEFAASLVQPGETIFIEMGSSNALIARTLGEQKKNVTIITVSSYIA 122 

Query: 122 NYLREYDSCQIVLLGGEYQSSSQVTVGPLLKKMISLFHVSLAFVGTDGFDPKTRIYG 178 
+ h++ C+++LLGG YQ S+ VGPL ++ I H S AF+G DG+ P+T G 
20 Sbjct: 123 HLLKD-APCEVILLGGVYQKKSESMVGPLTRQCIQQVHFSKAFIGIDGWQPETGFTG 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3049> which encodes the amino acid 

sequence <SEQ ID 3050>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2888 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/171 (76%) , Positives = 150/171 (87%) 

Query: 1 MmLmilSLVSQYQKIDvTSlTLSELLQVSKVTIRKnLDICjEGKGLLHREHGYAVLNSGDD 60 

MNRLE II LVSQ +KIDVN+LSE L VSKVTIRKDLDKLE KGLL REHGYAVLNSGDD 
Sbjct: 2 MNRLERIIQLVSQKKKIDVNSLSEQLDVSKVTIRKDLDKLESKGLLRREHGYAVLNSGDD 61 

Query: 61 LNVRLS FNHKTKKE I AALAANMVSDNDTIL I ESGSTCALLAENI CQTKRNVTI LTNSCFI 120 

LNVRLS+N+ K+ IA AA +V DNDTI + IESGSTCALLAE +CQTKRN+ ++TNSCFI 
Sbjct: 62 IJWRLSYOTNIKRRIAEKAAELVQDNDTIMIESGST 121 

Query: 121 AOTIiREYDSCQIVLLGGEYQSSSQVTVGPLLKKIvIISLFHVSLAFVGTDGFD 171 

ANY+R+Y SCQI+LLGG YQ +S+VTVGPLLK+MISLFHV+ FVGTDGF+ 
Sbjct: 122 ANYIRQYSSCQIILLGGYYQPNSEVTVGPLLKEMISLFHVNRVFVGTDGFN 172 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 995 

50 A DNA sequence (GBSxl055) was identified in S.agalactiae <SEQ ID 305 1> which encodes the amino 
acid sequence <SEQ ID 3052>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 1672 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG04879 GB:AE004578 probable transcriptional regulator 
[Pseudomonas aeruginosa] 
Identities = 20/70 (2B%) , Positives = 40/70 (56%) 

Query: 6 GFMGRDLMRSEVAQEMANAADEVIILTDSSKFNQTALVEQLPLSTVSQVITDKHPNSEIA 65 

G M + +E+A+ M A ++ ++ DSSK + AL + PLS +++++ D+ P E+ 
Sbjct: 179 GAMDFSIEEAEIARAMIAQAEQLTVIADSSKLGRRALFQVFPLSRINRLVVDRKPTGELW 238 

Query: 66 NLFQEAEITI 75 

Q+A + + 
Sbjct: 239 EALQQARVEV 248 

There is also homology to SEQ ID 3050. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 996 

A DNA sequence (GBSxl056) was identified in S.agalactiae <SEQ ID 3053> which encodes the amino 
acid sequence <SEQ ID 3054>. This protein is predicted to be transcriptional regulator. Analysis of this 
protein sequence reveals the following: 

3 N- terminal signal sequence 

• Final Results 

bacterial cytoplasm — Certainty=0. 0904 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9541> which encodes amino acid sequence <SEQ ID 9542> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04499 GB:AP001509 transcriptional regulator [Bacillus halodurans] 
Identities = 98/309 (31%) , Positives = 178/309 (56%) , Gaps = 1/309 (0%) 

ERQKLLAKVAYLYYMEGKSQSEIANELGIYRTTISRMLAKAREEGLVRIEISDFNPEIFQ 65 
E ++L+ KVA LYY EG +Q+++A ++G+ R IS++L KA+E+G+V I I D N + 
EERRLIVKVASLYYFEGWTQAQVAKKIGVSRPVISKLLNKAICEQGIVEIYIKDENIHTVE 64 

LESYFKSKyHLKDIEIVSSRKDSDTSEIEKDLAHVAAAMIRKKIKENDKVGIAWGRTLSK 125 
LE + KYHLK+ +V + I++ + + + K IK D +GI+WG T+S 





6 


Sbjct: 


5 


Query: 


66 


Sbjct: 


65 


Query: 


126 


Sbjct: 


124 


Query: 


186 


Sbjct: 


184 


Query: 


246 


Sbjct: 


244 



RF++ G P++ L + IGI L++L+++P I V+ G +K ++ A LK GY++ LVT 



306 DFSTAIjNIL 314 
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D STA +++ 
Sbjct: 304 DDSTAQSLI 312 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3055> which encodes the amino acid 
sequence <SEQ ID 3056>. Analysis of this protein sequence reveals the following: 



i N-terminal signal 

- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=0 . 2123 (Affirmative ) 

• Certainty=0. 0000 (Not Clear) < i 
- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 165/324 (50%) , Positives = 238/324 (72? 



Sbjct 
Query: 
Sbjct: 

Sbjct: 

Sbjct: 

Sbjct: 
Query: 
Sbjct: 



3 MKLERQKLIAKVAYLYYMEGKSQSEIAKT2LGIYRTTISRMIAKAREEGLVRIEISDFNPE 62 

MK ER+ +LLAKVAYL+ Y+ +GKSQ+ 1+ E+ IYRTT+ RMLAKA+EEG+VRIEI+D++ + 
1 MKEERI^LLAKVAYLHYVQGKSQTLISKEMNIYRTTVCRMLAKAKEEGIVRIEIADYDAD 60 

63 IPQLESYFKSKYHLKDIEIVSSRKDSDTSEIEKDLAHVAAAMIRKKIKENDKVGIAWGRT 122 

+F LE Y + +Y L+ +++V ++ + + ++A AA + R +K+ DK+G++WG T 
61 LFALEEYVRQQYGLEKLDLVPNQVEDTPMDTLTNVAKTAAEVFRHWKDGDKIGLSWGAT 120 

123 LSKVVEAMRPHPVSQVSFVPLAGGPSHINARYHVNTLVYEMSRRFQGSCTFINATLVQEN 182 

LS +++ + P + V PLAGGPSHINA+YHVNTLVY ++R F G+ F+NA ++QE+ 
121 LSCLiyroELNPKAMKDVFlYPIAGGPSHINAKYKVNTLWRLARIFHGNSAFMNAMVIQED 180 

183 ANIJUCGILTSKYFEGLMDNWEKIJDVAIVGVGGKPKSNEQ-QWLDLLNQDDFQCLDEEARV 241 

+IAKGIL SKYF ++ +W++LD+A+VG+GG+P S EQ QW DLL D L E AV 
181 KHLAKGILQSKYFNDILTSVTOQLDLALVGIGGEPNSLEQSQWRDLLTSSDHDQLKYEKAV 240 

242 GEITCRFFKHSGDPVNQHIAKRTIGITLEQLQKVPMRIAVAHGNYKAAALLAVLKKGYIN 301 

GE+ CRFF+ +G PV L RTIGI+LEQL++VP +AVA G +KA A+LA LK G+IN 
241 GEVCCRFFDQAGQPVYTGLQDRTIGISLEQLRRVPKTMAVATGKHKAKAILAALKAGFIN 3 00 

302 HLVTDFSTALNILRLDKDTFVDTI 325 

+LVTD T L +L LD+D ++ + 
301 YLVTDKETMLAVLALDEDIDLNNV 324 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 997 

A DNA sequence (GBSxl057) was identified in S.agalactiae <SEQ ID 3057> which encodes the amino 
acid sequence <SEQ ID 3058>. This protein is predicted to be PTS enzyme HI eel (celC). Analysis of this 
protein sequence reveals the following: 
Possible site: 55 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9543> which encodes amino acid sequence <SEQ ID 9544> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAA23551 GB:M93570 PTS enzyme III eel [Escherichia coli] 
Identities = 42/102 (41%) , Positives = 70/102 (68%) 



Query: 4 EIIVADQIIMGLILNAGDAKQHIYQALKLAKEGNFAESKIEIELADSALLKAHNLQTQFL S3 
5 E+ ++++MGLI+N+G A+ Y ALK AK+G+FA +K ++ + AL EAH +QT+ + 

Sbjct: 13 EVEELEEVVMGLIINSGQi^SLAYAALKQAKQGDFAAAKAMDQSRMAIJSIEAHLVQTKljI 72 

Query: 64 AQEAGGTRTDISALFIHSQDHLMTSITEIJILIKEIIDLRQEL 105 
+AG + +S + +H+QDHLMTS+ LI E+I+L ++L 
10 Sbjct: 73 EGDAGEGKMKVSLVLVHAQDHLMTSMLARELITELIELHEKL 114 _ 

A related DNA sequence was identified in S.pyogems <SEQ ID 3059> which encodes the amino acid 
sequence <SEQ ID 3060>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
15 >» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC74806 GB:AE000268 PEP-dependent phosphotransferase enzyme III 
for cellobiose, arbutin, and salicin [Escherichia coli] 
25 Identities = 33/97 (40%) , Positives = 66/97 (67%) 



Query: 7 DQIIMGLILNAGDAKQHIYQALKCAKEDDYATSEKEMALADDALLEAHl^LQTQFLAQEAS 6S 

++++MGLI+N+G A+ Y ALK AK+ D+A ++ M + AL EAH +QT+ + +A 
Sbjct: 18 EEWMGLI INSGQARSLAYAALKQAKQGDFAAAKAhlMDQ^RMALNEAHLVQTKLIEGDAG 77 

Query: 67 GNKSEITALFVHSQDHLMTTITEINLIKEIIDLRKEL 103 
K +++ + VH+QDHLMT++ LI E+I+L ++L 
. Sbjct: 78 EGKMKVSLVLVHAQDHLMTSMLARELITELIELHEKL 114 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/103 (78%), Positives = 94/103 (90%) 

Query: 3 MEIIVADQIIMGLILNAGDAKQHIYC^KLAKEGNFAESKIEIELADSALLEAHNLQTQF 62 

M++IV DQIIMGLILNAGDAKQHIYQALK AKE ++A S+ E+ LAD ALLEAHNLQTQF 
Sbjct: 1 MQVIVPDQIIMGLIDNAGDAKQHIYQALKCaKEDDYATSEKEMALADDALLEAHNLQTQF 60 

Query: 63 LAQEAGGTRTD I SALFIHSQDHLMTS I TEINLI KEI IDLRQEL 105 

LAQEA G + + + 1 +ALF+HSQDHLMT + 1 TE INL I KE 1 1 DLR+EL 
Sbjct: 61 LAQEASGNKSE I TALFVHSQDHLMTTI TEINLI KEI IDLRKEL 103 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 998 

A DNA sequence (GBSxl058) was identified in S.agalactiae <SEQ ID 3061> which encodes the amino 
acid sequence <SEQ ID 3062>. This protein is predicted to be PTS system, cellobiose-specific IIB 
component (celA). Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

sGP:AAF94440 GB:AE004207 PTS system, cellobiose-specif ic IIB 
component [Vibrio cholerae] 
5 Identities = 45/100 (46%) , Positives = 62/100 (62%) 

Query: 1 MIKIGLFCftAGFSTGMLVNNMKIAADKEGIEAHIEAYSQGKIADYAKDLDVALLGPQVSY 60 

MKIL C+AG ST MLV M+ AA+ +GIE I+A S + ++ DV LLGPQV + 

Sbjct: 1 MKKILLCCSAGMSTSMLWKMQQAAESKGIECKIDALSVNAFEEAIQEYDVCLLGPQVRF 60 

10 

Query: 61 TLDKSKSICDEYGVPIAVIPMADYGMIDGVKVLKLALSLL 100 

L++ + DEYG IA I YGM+ G +VL+ AL L+ 
Sbjct: 61 QLEELRKTADEYGKNIAAISPQAYGMMKGDEVLQQALDLI 100 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3063> which encodes the amino acid 
sequence <SEQ ID 3064>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have a cleavable N-term signal seg. 

Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) < auco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAF94440 GB:AE004207 PTS system, cellobiose-specif ic IIB 
component [Vibrio cholerae] 
Identities = 43/100 (43%) , Positives = 58/100 (58%) 

30 Query: 8 MIKIGLFCAAGFSTGMLVNNMKVAAEKKGIDCQIEAYAQGKLADYAPLIiDVALLGPQVAY 67 

M KI L C4AG ST MLV M+ AAE KGI+C+I+A + + DV LLGPQV + 

Sbjct: 1 MKKILLCCSAGMSTSMLVKKMQQAAESKGIECKIDALSVNAFEEAIQEYDVCLLGPQVRF 60 

Query: 68 TLDKSEAI CKDND I P I AVI PMADYGMLDGNKVLDLALSLV 107 
35 L++ + IA I YGM+ G++VL AL L+ 

Sbjct: 61 QLEELRKTADEYGKNIAAISPQAYGMMKGDEVLQQALDLI 100 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 79/101 (78%) , Positives = 92/101 (90%) 

Query: 1 MIKIGLFCAAGFSTGMLVNNMKIAADKEGIEAHIEAYSQGKIADYAKDLDVALLGPQVSY 60 

MIKIGLFCAAGFSTGMLVNNMK+AA+K+GI+ IEAY+QGK+ADYA LDVALLGPQV+Y 
Sbjct: 8 MIKIGLFCAAGFSTGMLVNNMKVAAEKKGIDCQIEAYACGKLADYAPLLDVALLGPQVAY 67 

Query: 61 TLDKSKS I CDE YGVP IAVI PMADYGMLDGVKVLKLALSLLE 101 

TLDKS++IC + +PIAVIPMADYGMLDG KVL LALSL++ 
Sbjct: 68 TLDKSEAI CKDND I P IAVI PMADYGMLDGNKVLDLALSLVK 108 

SEQ ID 3062 (GBS180) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 39 (lane 4; MW 12.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 2; MW 37.6kDa). 

The GBS180-GST fusion product was purified (Figure 204, lane 8) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 298), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



40 



45 
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Example 999 

A DNA sequence (GBSxl059) was identified in S.agalactiae <SEQ ID 3065> which encodes the amino 
acid sequence <SEQ ID 3066>. This protein is predicted to be pts system, cellobiose-specific iic component 
(celB). Analysis of this protein sequence reveals the following: 



INTEGRAL 



Possible site: 40 
»> Seems to have no N-terminal 
Likelihood =■ 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



97 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• 362 ( 334 - 374! 

■ 198 ( 178 - 

- 45 ( 27 - 

■ 308 ( 289 - 

- 413 ( 395 - 416! 

■ 93 ( 72 - 

• 244 ( 222 - 246: 



- Certainty=0. 5670 (Affirmative) . 
■ Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA17390 GB:U07818 cellobiose phosphotransferase enzyme II' ' 
[Bacillus stearothermophilus] 
Identities = 160/415 (38%), Positives = 251/415 (59%), Gaps = 13/415 (3%) 

Query: 15 KFVNMRGIIMjKDGMIAILPLTWGSLFLILGQLPPKGLNQAIANVFGPEWTEPFMQVYS 74 

K R + A++DG++ +PL ++GSLFLI+G LP G N+ +A FG W + 4 
Sbjct: 18 KIAEQRHLQAIRDGIILSMPLLIIGSLFLIVGFLPIPGYNEIWIAKWFGEHWLDKLLYPVG 77 

Query: 75 GTFAIMGLISCFAIAYAYAKNSSVEPLPAGVLSLSSFFILMKSSYIPVKGEA IA 128 

TF IM L+ F 4AY A+ V+ L AG +SL++F +L +P E 44 

Sbjct: 78 ATFDIMALWSFGVAYRIAEKYKVDALSAGAISLAAF-LLATPYQVPFTPEGAKETIMVS 136 

Query: 129 DAISKVWFGGQGIIGAIIIGLWGAIYTWFIQHHIVIKMPEQVPCAIAKQFEAMIPAFVI 188 

I W G +G+ A+I+ +V IY IQ +IVIK+P+ VP A+A+ F A+IP + 
Sbjct: 137 GGIPVQWVGSKGLFVAMILAIVSTEIYRKIIQKNIVIKLPDGVPPAVARSFVALIPGAAV 196 

Query: 189 FLLSMIVYLIAKVTTGGTFIEMIYDIIQVPLQGLTGSLYGAIGIAFFISFLWWFGVHGQS 248 

+4 + LI ++T +F ++ ++ PL L GS44GAI 4 LW G4HG 4 

Sbjct: 197 LVVVWVARLILEMTPFESFHNIVSVLLNKPLSVLGGSVFGAIVAVLLVQLLWSTGLHGAA 256 

Query: 249 WNGIVTALLLSNLDANKSLIAAN-RLTLDNGAHIVTQQFLDSFLILSGSGITFGLVIAM 3 07 

4V G44 4 LS 4D N4 + N L N 44TQQF D 44 4 GSG T L 4 M 
Sbjct: 257 IVGGVMGPIWLSLMDENRMVFQQNPNAELPN VITQQFFDLWIYIGGSGATLALALTM 313 

Query: 308 LFAAKSKQYKALGKVAAFPAIFNViraPIWGFPIVMNPVWLP 367 

4F A4S4Q K4LG44A P IFN4NEPI FG PIVMNP44 4PFILVPV4 44 Y A4A 
Sbjct: 314 MFFARSRQLKSLGRLAIAPGIFNINEPITFGMPIVMNPLLIIPFILVPVVLVWSYAAMA 373 

Query: 368 VGFMQPFSGVTLPWSTPAIISGFMVGGWQ--GALVQIVILAISTAVYFPFFKIQD 420 

G 4 SGV 4PW4TP 4ISG44 G 4 G444QIV 14 A4Y4PFF I D 
Sbjct: 374 TGLVAKPSGVAVPWTTPIVISGYLATGGKISGSILQIVNFFIAFAIYYPFFSIWD 428 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2215> which encodes the amino acid 
sequence <SEQ ID 2216>. Analysis of this protein sequence reveals the following: 



Possible site: 40 
»> Seems to have 1 



Likelihood = 
Likelihood = 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 



N-terminal signal sequence 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



347 - 363 ( 335 - 373) 

29 - 45 ( 27 - 50) 

182 - 198 ( 179 - 204) 

398 - 414 ( 395 - 420) 

293 - 309 ( 291 - 314) 
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INTEGRAL Likelihood = -3.61 Transmembrane 140 - 156 ( 134 - 160) 
INTEGRAL Likelihood = -2.60 Transmembrane 229 - 245 ( 229 - 246) 
INTEGRAL Likelihood = -0.75 Transmembrane 72 - 88 ( 72 - 88) 

Final Results 

bacterial membrane Certainty=0. 4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 366/428 (85%), Positives = 402/428 (93%), Gaps = 1/428 (0%) 

Query: 1 MSKFDSQKI ITPIMKFVNMRGI IALKDGMIAILPLTWGSLFLILGQLPFKGLNQAIANV 60 

M+K t Q II PIM FVNMRGIIALKDGMLAILPLTWGSLFLI GQ+PF+G+N AIA+V 
Sbjct: 1 MAKMIMQNIIKPIMTFvlIMRGIIALKDGMLAILPLTWGSLFLIAGQIPFQGVIfflAIASV 60 

Query: 61 FGPEWTEPFMQVYSGTFAIMGLISCFAIAYAYAXN3SVEPLPAGVLSLSSFFILMKSSYI 120 

FG +WTEPFMQVY GTFAIMGLISCFAI Y+YAKNS VEPLP+GVLSLS+FFIL++SSY+ 
Sbjct: 61 FGADWTEPFMQVYHGTFAIMGLISCFAIGYSYAKNSGVEPLPSGVLSLSAFFILLRSSYV 120 

Query: 121 PVKGEAIADAISKVWFGGQGIIGAIIIGLWGAIYTWFIQHHIVIKMPEQVPQAIAKQFE 180 

P +GEAI DAISKVWFGGQGIIGAI+IGL VGA+YT FI+ HIVIKMP+QVPQAIAKQFE 
Sbjct: 121 PAEGEAIGDAISKVWFGGQGIIGAIVIGLTVGAVYTTFIRRHIVIKMPDQVPQAIAKQFE 180 

Query: 181 AMIPAFVIFLLSMIVYLIAK- VTTGGTFIEMIYDIIQVPLQGLTGSLYGAIGIAFFISFL 239 

AMIPAFVIF LSM+VY4IAK VT GGTFIEMIYD+IQVPLQGLTGSLYGA+GIAFFISFL 
Sbjct: 181 AMIPAFVIFTLSMLVYIIAKSVTGGGTFIEMIYDVIQVPLQGLTGSLYGALGIAFFISFL 240 



Query: 300 TFGLVI AMLFAAKSKQYKALGKVAAF PAI FNVNE P I VFGF P I VMNPVMFLPFI LVPVT tAA 359 

TFGLV+AM+FAAKSKQYKALGKVAAFPA+FNVNEP+VFGFPIVMNPVMFLPFILVPVLAA 
Sbjct: 301 TFGLWAMIFAAKSKQYKALGKVAAFPALFNVNEPWFGFPIVMNPVMFLPFILVPVLAA 360 

Query: 360 LIVYGAIAVGFMQPFSGvTLPWSTPAIISGFMVGGWQGALVQIVILAISTAVYFPFFKIQ 419 

L VYGAIA+GFMQPF+GVTLPWSTPAI I SGFMVGGWQGA+VQI + 1 L +ST VYFPFFKIQ 
Sbjct: 361 LTVYGAIAIGFMQPFAGVTLPWSTPAIISGFMVGGWQGAIVQILILIMSTLVYFPFFKIQ 420 

Query: 420 DNITYKNE 427 

DN+ Y+NE 
Sbjct: 421 DNMAYQNE 428 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1000 

A DNA sequence (GBSxl060) was identified in S.agalactiae <SEQ ID 3067> which encodes the amino 
acid sequence <SEQ ID 3068>. This protein is predicted to be formate acetyltransferase 2 (pflB). Analysis 
of this protein sequence reveals the following: 

) N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .5049 (Affirmative) < suco 

bacterial membrane Certainty=C . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
D GB:AE000184 putative formate acetyltransferase 
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[Escherichia coli K12] 
Identities = 414/805 (51%) , Positives = 555/805 (68%) , Gaps = 14/805 (1%) 

Query: 25 LTER^SYRDKVLD-KKPFIDAERAILVTEaYQKHQEKPNVLKRAYMLQNILEKMTIYID 83 

L++R+ ++++ ++ KP + ERA TE YQ+H +KP ++RA L + L TI+I 
Sbjct: 9 LSDRIKAHKNALVHIVKPPVCTERAQHYTEMYQQHLDKPIPWRMjALAHHLANRTIWIK 68 

Query: 84 DETMIVGNQASSDKDAPIFPEYTLEFWNELDLFEKRDGDVFYITEETKEQIRNIAPFWE 143 

+ +I+GNQAS + APIFPEYT+ ++ E+D R G F ++EE K + + P+W 
Sbjct: 69 HDELIIGNQASEVPJ^PIFPEYTVSWIEKEIDDtADRPGAGFAVSEENKRVLHEVCPWWR 128 

Query: 144 NWWLRARAGVMLPEEVQVYMETGFFGKEG™ 203 

++ R M +E + + TG EG M SGDAHLAVN+ LLE+GL G ++ + 
Sbjct: 129 GQTVQDRCYGMFTDEQKGLLATGIIKAEGNMTSGDAHLAVNFPLLLEKGLDGLREEVAER 188 

Query: 204 KADLDLTKPESIDKYHFYDSILITIEAVKTYAERFAILAKKQAKTANAK-RRQELLDIAS 262 

++ ++LT E + F +11+ AV + ERFA LA++ AT + RR ELL +A 
Sbjct: 189 RSRINLTVLEDLHGEQFLKAID I VLVAVS 3H I ERFAALAREMAATETRESRRDELLAMAE 248 

Query: 263 ICERVPYYPAETFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKSDLEAGRETE 322 

C+ + + P +TF +A+Q +FIQ ILQIESNGHS+S+GR DQY+YPY + D+E + + 
Sbjct: 249 WCDLIAHQPPQTFWQALQLCYFIQLILQIESNGHSVSFGRMDQYLYPYYRRDVELNQTLD 308 

Query: 323 -DSIVERLTNLWIKTITINKVRSQAHTFSSAGSPLYQNVTIGGQTR HKEDAVNPLSF 378 

+ +E h + W+K + +NK+RS +H+ +SAGSPLYQNVTIGGQ DAVNPLS+ 
Sbjct: 309 REHAIEMLHSCWLKLLEVNKIRSGSHSKASAGSPLYQNVTIGGQNLVDGQPMDAWPLSY 368 

Query: 379 LVLKSVAQTHLPQPNLTVRYHANLDKSFMNEAI EVMKLGFGMPAFNNDE III PS FI KKGV 438 

+L+S + QPNL+VRYHA + F++ ++V++ GFGMPAFNNDEI + IP FIK G+ 
Sbjct: 369 AILESCGRLRSTQPNLSVRYHAGMSNDFLDACVQVIRCGFGMPAFNNDEIVIPEFIKLGI 428 

Query: 439 SEEDAYDYSAIGCVETAVPGKWGYRCTGMSYINFPKVLLITMNDGIDPASGKRFAP 494 

+DAYDY+AIGC+ETAV GKWGYRCTGMS+INF +V+L + G D SGK F P 
Sbjct: 429 EPQDAYDYAAIGCIETAVGGKWGYRCTGMSFINFARVMLAALEGGHDATSGKVFLPQEKA 488 

Query: 495 -SYGHFTQMTSYKELKEAWDKTLRYLTRMSVIVENAIDISLEREVPDILCSALTDDCIGR 553 

S G+F ++ E+ +AWD +RY TR S+ +E +D LE V DILCSAL DDCI R 
Sbjct: 489 LSAGNFN NFDEVMDAWDTQIRYYTRKSIEIEYVVDTMLEENVHDILCSALVDDCIER 545 

Query: 554 GKHLKEGGAvYDYISGLQVGIANLSDSI^AAIiKKLVFEEKRLTTLEVWQALQSDYAGPRGE 613 

K +K+GGA YD++SGLQVGIANL +SLAA+KKLVFE+ + ++ AL D+ G E 
Sbjct: 546 AKS I KQGGAKYDWVSGLQVGIANLGNSIiAAVKKLVFEQGAIGQQQLAAALADDFDGLTHE 605 

Query: 614 EIRQMLINEAPKYGNDDDYADSLVRECYDVYVEEIAKYPNTRYGRGPIGGIRYSGTSSIS 673 

Sbjct: 606 C 

Query: 674 ANVGQGRGTLATPDGRHAGTPLAEGCSPSHNI4DKKGPTSVLKSVSKLPTDEIVGGVLLNQ 733 

ANV G T+ATPDGR A TPLAEG SP+ D GPT+V+ SV KLPT I+GGVLLNQ 
Sbjct: 666 ANVPFGAQTMATPDGRKAHTPLAEGASPASGTDHLGPTAVIGSVGKLPTAAILGGVLLNQ 725 

Query: 734 KVNPQTLAKEEDKQKL I ALLRTFFNRLHGYH I QYNWSRETL IDAQKHPEKHRDLI VRVA 793 

K+NP TL E DKQKL+ LLRTFF G+HIQYN+VSRETL+DA+KHP+++RDD+YRVA 
Sbjct: 726 KLNPATLENESDKQKLMILLRTFFEVHKGWHIQYNIVSRETLLDAKKHPDQYRDLVVRVA 785 

Query: 794 GYSAFFNVLSKATQDDI IARTEHAL 818 

GYSAFF LS QDDIIARTEH L 
Sbjct: 786 GYSAFFTALSPDAQDDI IARTEHML 810 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3069> which encodes the amino acid 
sequence <SEQ ID 3070>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Certainty=0. 4763 (Affirmative) < suco 
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- Certainty=0. 0000 {Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 694/803 (86%) , Positives = 747/803 (92%) 

Query: 16 QNSQKHFGYLTERMYSYRDKVLDKKPFIDAERAILVTEAYQKHQEKPNVLKRAYMIjQNIL 75 

+ +FG+LT+RM YR+ VLDKKP+IDAERAIL TEAYQKHQ KP LKRAYMLQ IL 
Sbjct: 3 ETKSPYFGHLTDRMTHYREAVLDKKPYIDAERAIIATEAYQKHQNKPANLKRAYMLQTIL 62 

Query: 76 EKMTIYIDDETM1VGNQASSDKDAPIFPEYTLEFWMELDLFEKRDGDVFYITEETKEQI 135 

E MTIYI+DE++I GNQASS+KDAPIFPEYTLEFV+NELDLFEKRDGDVFYITEETK+Q+ 
Sbjct: 63 ENMTIYIEDESIjIAGNQASSNKDAPIFPEYTLEFVLNELDLFEKRDGDVFYITEETKQQL 122 

Query: 136 RNIAPFWEN1^RARAGVMLPEEVQOT^1ETG?FGMEGKJ5NSGDAHLAVNYQKLLEEGLIG 195 

R+IAPFWENNNLRAR GV+LPEEVQVYMETGFFGMEGKMNSGDAHLAVNYQKLLE GL G 
Sbjct: 123 RDIAPFWENNNLRARCGvIiLPEEVQVYMETGFFGNffiGKMSrSGDAHIAVlSreQKLLEHGLKG 182 

Query: 196 FEKKARKAKADLDLTKPES I DICYHF YDS I L IT I EAVKTYAERFAI LAKKQAKTANAKRRQ 255 

FE++AR AKA LDLT PE+IDKYHFYDS+ I I+AVKTYA+R+A LA++ AKTA +R+ 
Sbjct: 183 FEEPARAAKAALDLTIPENIDKYHFYDSVFIVIDAVKTYAKRYAKIARELAKTAKPERQA 242 

Query: 256 ELLDIASICERVPYYPAETFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKSDL 315 

ELLDIA IC++VPY PA+TFAFAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVK+DL 
Sbjct: 243 ELLDIARICDKVPYEPAKTFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKADL 302 

Query: 316 EAGRETEDSIVERLTNLWIKTITINKVRSQAHTFSSAGSPLYQNVTIGGQTRHKEDAVNP 375 

EAGRETED+IVERLTNLWIKT+TINKVRSQAHTFSSAGSPLYQNVTIGGQTR K+DAVNP 
Sbjct: 303 EAGRETEDTIVERLTOLWIKTLTIMCVRSQAHTFSSAGSPLYQNVTIGGQTRDKKDAVNP 362 

Query: 376 LSFLVLKSVAQTHLPQPNLTWYHA^DKSFMNEAIEVMKLGFGMPAFNNDEIIIPSFIK 435 

LS+LVL+SVAQT LPQPNLTVRYH LD +FMNE IEVMKLGFGMPA NNDEIIIPSFIK 
Sbjct: 363 LSYLVLRSVAQTKLPQPNLTTOYHKGLDOTFNOECIEVMKLGFGMPAMNNDEIIIPSFIK 422 

Query: 436 KGVSEEDAYDYSAIGCVETAVPGKWGYRCTGMSYXNFPK^LITMNDGIDPASGKRFAPS 495 

KGVSEEDAYDYSAIGCVETAVPGKWGYRCTGMSYINFPK+LLITMNDGIDPASGKRFA 
Sbjct: 423 KGVSEEDAYDYSAIGCVETAVPGKWGYRCTGMSYIKFPKILLITMNDGIDPASGKRFAKG 482 

Query: 496 YGHFTQMTSYKELKEAWDKTLRYLTRMSVIVE25AIDISLEREVPDILCSALTDDCIGRGK 555 

+GHF MTSY+ELK AWD TLR +TRMSVIVENAID+ LEREVPDILCSALTDDCIGRGK 
Sbj ct : 483 HGHFKDMTSYEELKAAWDATLREITRMSVIVENRIDLGLEREVPDILCSALTDDCIGRGK 542 

Query: 556 HLKEGGAVYDYISGLQVGIANLSDSLAALPCKLVFEEI^RLTTLEVWQALQSDYAGPRGEEI 615 

IiKEGGAVYDYISGLQVGIANLSDSLAALKKLVFEE RLT E+W+AL+SD+AG RGE+I 
Sbjct: 543 TLKEGGAVYDYISGLQVGIANLSDSLAALKKLVFEEGRLTPEELWKALESDFAGERGEDI 602 

Query: 616 RQMLINEAPECYGNDDDYADSLVRECYDVYV3EIAKYPNTRYGRGPIGGIRYSGTSSISAN 675 

RQMLIW-1-APKYGNDDDYADSLV E YD Y++EIAKYPNTRYGRGPIGGIRYSGTSSISAN 
Sbjct: 603 RQMLIHDAPKYGNDDDYADSLWEAYDTYIDEIAKYPNTRYGRGPIGGIRYSGTSSISAN 662 

Query: 676 VGQGRGTIATPDGRHAGTPLAEGCSPSHNMDKKGPTS\rLKSVSKDPTDEIVGGVLLNQKV 735 

VGQG+GTLATPDGRHAGTPLAEGCSP H+MDKK3PTSVLKSV+KLPTDEIVGGVLLNQKV 
Sbjct: 663 VGQGKGTLATPDGRHAGTPLAEGCSPEHSMDia<GPTS\rLKSVAKLPTDEIVGGVLLNQKV 722 

Query: 736 NPQTIAKEEDKQKLIALLRTFFNRLHGYHIQYHWSRETLIDAQKHPEKHRDLIVRVAGY 795 

NPQTLAKEEDK KL+ALLRTFFNRLHGYHIQYNWSRETLIDAQECHPEKHRDLIVRVAGY 
Sbjct: 723 MPQTIAKEEDKLKLMALLRTFFNRLHGYHIQYNWSRETLIDAQKHPEKHRDLIVRVAGY 782 

Query: 796 SAFFNVLSKATQDDI IARTEHAL 818 

SAFFNVLSKATQDDI I RTEH L 
Sbjct: 783 SAFFNVLSKATQDDI IERTEHTL 805 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1001 

A DNA sequence (GBSxl061) was identified in S.agalactiae <SEQ ID 3071> which encodes the amino 
acid sequence <SEQ ID 3072>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1024 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



MEFLLDTLNLEAIKKWHHILPLAGVTSNPTIAKKEGDIHFFQRIRDVREIIGREASLHVQ 60 
M+ ++D +N+E IK I + GVTSNP+I KG + 1+ +RE IG + LHVQ 
MKLIIDDVNIEKIKDVFSIFQIDGVTSNPSILHICYGKQPYEILIK-IREFIGENSELHVQ 59 



LA AGA Y APY NR++NL 



AV +F D++ 





1 


Sbjct: 


1 




61 


Sbjct: 


60 




121 


Sbjct: 


120 




181 


Sbjct: 


180 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3073> which encodes the amino acid 
sequence <SEQ ID 3074>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1090 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/222 (71%), Positives = 194/222 (87%) 

Query: 1 MEFLLDTI^EAIHCWHHILPIjAGVTSNPTIAKKEGDIHFFQRIRDVREIIGREASLHVQ 60 

ME++LDTL+LEAIKKWHHILPLAGVTSNP+IAKKEG+I FF+RIR+VR IIG +AS+HVQ 
Sbjct: 1 MEYMLDTLDLEAI KKWHH I LPLAG VTSNPS I AKKEGE IDFFERIREVRAI IGDKAS I HVQ 60 

Query: 61 WAKDYQGILDDAAKIRQETDDDIYIKVPVTPDGLAAIKTLKAEGYNITATAIYTSMQGL 120 

V+A+DY+GIL DAA+IR++ D +Y+KVPVT +GLAAIKTLKAEGY+ITATAIYT+ QGL 
Sbjct: 61 VIAQDYEGILKDAAEIRRQCGDSVYVKVPVTTEGLAAIKTLKAEGYHITATAIYTTFQGL 120 

Query: 121 LAI SAGADYLAPYFNRMENLD IDATQVI KELAQAIERTGSSS KI IiAASFKNASQVTKALS 180 

LAI AGADYLAPY+NRMENL+ ID VI++LA+AI R ++SKILAASFKN +QV K+ + 
Sbjct: 121 LAIEAGADYMPYYNRirat^IDPEAVIEQLAEAINREN^ 180 

Query: 181 QGAQSITAGPDIFESVFAMPSIAKAVNDFADDWKASQHSEHI 222 

GAQ+ITAGPD+FE+ FAMPSI KAV+DF DW+A H + I 
Sbjct: 181 LGAQAITAGPDVFEAGFAMPSIQKAVDDFGKDWEAIHHRKSI 222 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1002 

A DNA sequence (GBSxl062) was identified in S.agalactiae <SEQ ID 3075> which encodes the amino 
acid sequence <SEQ ID 3076>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3086 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9545> which encodes amino acid sequence <SEQ ID 9546> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA22477 GB:MS5289 glycerol dehydrogenase [Bacillus 
s tearothermophi lus ] 
Identities = 199/362 (54%), Positives = 271/352 (73%), Gaps = 2/362 (0%) 

Query: 4 KVFASPSRYIQGKDALFQSIEHIKSLGQTPLILCDDVWNIVGERFLSYLQD-DLLPHRV 62 

+VF SP++Y+QGK+ 4 + +++ +G +++ D++V+ I G ++ L+ ++ V 
Sbjct: 5 RVFISPAKTVC<3KNVITKIANYIM3IGNKTWI^ 64 

Query: 63 SFNGEaSDNEINRWAVAKEKNSDLIIGLGGGKTIDSAKAIADKVNLPVVIAPTVASTDA 122 

F+GEAS NE+ R+ +A++ + ++IG+GGGKT+D+AKA+AD+++ +V1 PT ASTDA 
Sbjct: 65 VFSGEASRNEvERIANIARKAEAAIVlGVGGGKTLDTAKAVADELDAYIVIVPTAASTDA 124 

Query: 123 PTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVIAQAPKRLLASGIADGLATWVEARAV 182 

PTSALSVIY+D+G FE Y FY KNPDLVLVDT++IA AP RLLASGIAD LATWVEAR+V 
Sbjct: 125 PTSALSVIYSDDGVFESYRFYKKNPDLVLVDTKIIANAPPRLIASGIADALATWVEARSV 184 

Query: 183 LQKKGIA^GGRQTIAGVAIAQACERTLETOSLCAI^CIXUCVVTKALENVIEAOT 242 

++ G MAGG T+A AIA+ CE+TLF A + AKWT ALE V+EANTLLSG 

Sbjct: 185 IKSGGKTMAGGIPTIAAEAIAEKCEQTLFKYGKIAY3SVKAKVVTPALEAWEANTLLSG 244 



++EI+RYI LY 



Query: 303 AIGMPTTLAELHLGDATYEELLKVGQQATIEGETIHEMPFKISAEDVAAALLTVDRYVSN 362 

++ +P TL ++ L DA+ E++LKV + AT EGETIH F ++A+DVA A+ D+Y 
Sbjct: 305 SLDLPVTLEDI KLKDASRED I LKVAKAATAEGET I HN - AFNVTADDVADAI FAADQYAKA 363 

Query: 363 HQ 364 

Sbjct: 364 YK 365 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3077> which encodes the amino acid 
sequence <SEQ ID 3078>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.62 Transmembrane 101 - 117 ( 98 - 119) 

Final Results 

bacterial membrane Certainty=0. 2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAA22477 GB:M65289 glycerol dehydrogenase [Bacillus 
stearothermophilus] 
Identities = 202/357 (56%), Positives = 261/357 (72%), Gaps = 1/357 (0%) 

Query: 2 KVFASPSRYIQGKNALFTNVKTLKQLGDSPILLCDDWYGIVGERFESYLIDNGMTPVHV 61 

+VF SP++Y+QGKN + L+ +G+ +++ D++V+ I G + L + V 

Sbjct: 5 RVFISPAKWQGK1JVITKIANYLEGIGNKTWIADEIVWKIAGHTIVNELKKGNIAAEEV 64 

Query: 62 AFNGEASDNEI SRWAI AKENGND VI IGLGGGKTID S AKAI ADLLAVPVI IAPT IASTDA 121 

F+GEAS NE+ R+ IA++ ++IG+GGGKT+D+AKA+AD L ++I PT ASTDA 
Sbjct: 65 VFSGEASRNEVERIANIARKAEAAIVIGVGGGKTLDTAKAVADELDAYIVIVPTAASTDA 124 

Query: 122 PTSALS VTYTDEGAFEKYI FYSKNPDLVLVDTQVI CQAPKRLLASGIADGLATWVEARAV 181 

PTSALSVIY+D+G FE Y FY KNPDLVLVDT++I AP RLLASGIAD LATWVEAR+V 
Sbjct: 125 PTSALSVIYSDDGVFESYRFYKKNPDLVLVDTKIIANAPPRLLASGIADAIATWVEARSV 184 

Query: 182 MQKNGDTMAGGNQTLAGVAIAKACEQTLFADGLKAMASCDRQWTPALENVIEANTLLSG 241 

++ G TMAGG T+A AIA+ CEQTLF G A S +WTPALE V+EANTLLSG 
Sbjct: 185 IKSGGKTMAGGIPTIAAEAIAEKCEQTBFKYGKIAYESVlCAKvVTPALEAVVEANTLLSG 244 

Query: 242 LGFESAG1AAAHAIHNGFTALTGAIHHLTHGEKVAYGTLTQLFLENRSREEIDRYIDFYQ 3 01 

LGFES GLAAAHAI HNGFTAL G IHHLTHGEKVA+GTL QL LE S++EI+RYI+ Y 
Sbjct: 245 LGFESGGLAAAHAIHNGFTALEGEIHHLTHGEKVAFGTLVQLALEEHSQQEIERYIELYL 304 

Query: 302 AIGMPTTLKEMHLDTATQEDFLKIGRQATMAGETIHQMPFVISPEDVAAALVAVDAY 358 

++ +P TL+++ L A++ED LK+ + AT GETIH F ++ +DVA A+ A D Y 
Sbjct: 305 SLDLPVTLEDIKLKDASREDILK^AKAATAEGETIHN-AFNVTADDVADAIFAADQY 360 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 287/361 (79%) , Positives = 325/361 (89%) , Gaps = 1/361 (0%) 

Query: 3 MKVFASPSRYIQGKDALFQSIEHIKSLGQTPLILCDDWYNIVGERFLSYLQDD-LLPHR 61 

MKVFASPSRYIQGK+ALF +++ +K LG +P++LCDDWY IVGERF SYL D+ + P 
Sbjct: 1 MKVFASPSRYIQGKNALFTNVKTLKQLGDSPILLCDDWYGIVGERFESYLIDNGMTPvH 6 0 

Query: 62 VSFNGFASDNEINRWAVAKEKNSDLIIGLSGGKTIDSAKAIADICvNLPWIAPTVASTD 121 

V+FNGEASDNEI+RWA+AKE +D+ 1 IGIiGGGKTIDSAKAIAD + +PV+IAPT+ASTD 
Sbjct: 61 VAFNGEASDNEI SRWAI AKENGNDVI IGLGGGKTIDSAKAI ADLLAVPVI IAPT I ASTD 120 

Query: 122 APTSALSVIYTDEGAFEKYIFYSIQIPDLvIiVDTQVIAQAPICRLLASGIADGLATWVEARA 181 

APTSALSVI YTDEGAFEKYI FYSKNPDLVLVDTQVI QAPKRLLASGIADGLATWVEARA 
Sbjct: 121 APTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVICQAPKRLLASGIADGLATWVEARA 180 

Query: 182 VLQKNGIAMAGGRQTLAGVAIAQACERTLFNDSLQAIAACDAIOArrKALENVIEANTLLS 241 

V+QKNG MAGG QTLAGVAIA+ACE+TLF D L+A+A+CD +WT ALENVIEANTLLS 
Sbjct: 181 VMQKNGDT^GGNQTIAGVAIAKACEQTLFADGLKAMASCDRQVVTPALENVIEANTLLS 240 

Query: 242 GLGFESAGLAAAHAIHHGFTALSGDIHHLTHGEKVAYGTLTQLFLENRPKEEIDRYINLY 301 

GLGFESAGLAAAHAIHNGFTAL+G IHHLTHGEKVAYGTLTQLFLENR +EEIDRYI+ Y 
Sbjct: 241 GLGFESAGLAAAHAIHNGFTALTGAIHHLTHGEICVAYGTLTQLFLENRSREEIDRYIDFY 300 

Query: 302 QAIGMPTTLAELHLGDATYEELLKVGQQATIEGETIHEMPFKISAEDVAAALLTVDRYVSN 362 

QAIGMPTTL E+HL AT E+ LK+G+QAT+ GETIH+-MPF IS EDVAAAL+ VD YV++ 
Sbjct: 301 QAIGMPTTLKEMHLDTATQEDFLKIGRQATMAGETIHQMPFVISPEDVAAALVAVDAYVTS 361 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1003 

A DNA sequence (GBSxl063) was identified in S.agalactiae <SEQ ID 3079> which encodes the amino 
acid sequence <SEQ ID 3080>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
5 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 262 - 278 ( 262 - 279) 

Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

10 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA88310 GB:AB028865 O-acetylserine lyase [Streptococcus suis] 
15 Identities = 239/304 (78%) , Positives = 273/304 (89%) 



Query: 4 1YNSITDLIGNTPIIQLHHIVPEGAAEVYVKLSSFNPGSS\TCDRIALAMIEDAEQKGILK 63 

IY +IT L+G TP+ 1 +L++ IVPEGAAEVYVKL3+FNPGSSVKDRIALAMIEDAE+ G +K 
Sbjct: 3 IYQNITQLVGKTPVIKLtMIVPEGAAEVYVKLE^FNPGSSVKDRIALAMIEDAEKAGTIK 62 

Query: 64 AGDTIVEPTSGNTGIGLAWVGKAKGYNVI IVMPETMS IERRKI IQAYGAQLVLTPGSEGM 123 

GDTIVEPTSGNTG1GLAWVG AKGYNVIIVMPETMS+ERRKIIQAYGA+LVLTPGSEGM 
Sbjct: 63 PGDTIVEPTSGNTGIGLAWVGAAKGYNVI I VMPETMSVERRKI IQAYGAELVLTPGSEGM 122 

Query: 124 KGAIAKAKEISAEQNAWLPLQFNNQANPEIHEKTTGREIIETFGEKGLDAPIAGVGTGGT 183 

KGAIAKAKEI+ E+N W+P QF N +NP++HE TTG+EI+E FG GLDAF++GVGTGGT 
Sbjct: 123 KGAIAKAKEIAEEKNGWPFQFANPSNPKOTEDTTGQEIIiEDFGTTGLDAFVSGVGTGGT 182 

Query: 184 ITGVSRALKKVJMPDVAIYAVEADESAILSGEQPGPHKIQGISAGFIPETLATDSYDHIIR 243 

++GVS LK NPD4AIYAVEADESA+LSGE PGPHKIQGISAGFIP+TL T +YD IIR 
Sbjct: 183 VSGVSHVLKTANPDIAIYAVEADESAVLSGEAPGPHKIQGISAGFIPDTLDTSAYDGIIR 242 

Query: 244 VTSDDAIETGRIIGGLEGFLAGISASAAIYAAIEVAKQLGKGKKVLALLPDNGERYLSTS 303 

V SDDA+ TGR IGG EGFL GIS+ AAI+AAIEVAK+LG GKKVLA+LPDNGERYLST+ 
Sbjct: 243 VKSDDALATGRAIGGKEGFLVGISSGAAIHAAIEVAKELGTGKKVLAILPDNGERYLSTA 302 

Query: 304 LYDF 307 
LY+F 

Sbjct: 303 LYEF 306 

A related DNA sequence was identified in S.pyogenes <SEQ ID 308 1> which encodes the amino acid 
sequence <SEQ ID 3082>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.48 Transmembrane 262 - 278 ( 262 - 278) 

Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAA88310 GB:AB028865 O-acetylserine lyase [Streptococcus suis] 
Identities = 235/303 (77%) , Positives = 261/303 (85%) 

55 

Query: 4 IYKTITELVGQTPIIKLNRLIPNEAADVT^EAFNPGSSVKDRIALSMIEAAEAEGLIS 63 

IY+ IT+LVG+TP+IKLN ++P AA+VYVKLEAFNPGSSVKDRIAL+MIE AE G I 
Sbjct: 3 IYQNITQLVGKTPVIKLNNIVPEGJ\AE^/YVKLEAFNPGSSVKDRIALAMIEDAEKAGTIK 62 



60 Query: 64 PGDVIIEPTSGOTGIGLAWVGAAKGYRVIIVMPETMSLERRQIIQAYGAELVLTPGAEGM 123 

PGD I+EPTSGNTGIGLAWVGAAKGY VIIVMPETMS+ERR+IIQAYGAELVLTPG+EGM 
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Sbjct: 


63 


Query; 


124 


Sbjct: 


123 


Query: 


184 


Sbjct: 


183 




244 


Sbjct: 


243 


Query: 


304 


Sbjct: 


303 
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Sbjct: S3 PGDTIVEPTSGNTGIGIAWVGAAKGYNVIIVMPETMSVERRKIIQAYGAELVLTPC3SEGM 122 

lETLAIELGAMMPMQFNNPANPSIHEKTTAQEILEAFKEISLDAFVSGVGTGGT 183 
i+ +A E W+P QF NP+NP +HE TT QEILE F LDAFVSGVGTGGT 



LSGVSHVLKKMPETVIYAVEAEESAVLSGQEPGPHKIQGISAGFIPNTLDTKAYDQIIR 243 
+SGVSHVLK ANP+ IYAVEA+ESAVLSG+ PGPHKIQGISAGFIP+TLDT AYD IIR 
VSGVSHVLKTANPDIAI YAVEADESAVLSGEAPGPHKIQGI SAGFI PDTLDTSAYDGI IR 242 



VKS DAL T R G KEGFLVGISSGAA++AA:EVAK+LG GK VL ILPDNGERYLST 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/306 (72%) , Positives = 263/306 (85%) 

Query: 1 MSKIYNSITDLIGNTPIIQLHHIVPEGAAEVYVTCLESFNPGSSVKDRIALAMIEDAEQKG 60 

M+KIY +IT+L+G TPII+L+ ++P AA+VYVKLE+FNPGSSVKDRIAL+MIE AE +G 
Sbjct: 1 MTKIYKTITELVGQTPIIKMIRLIPISIEAADVWKLEAFNPGSSVKDRIALSMIEAAEAEG 60 

Query: 61 ILKAGDTIVEPTSGiNTGIGLAWVGKAKGYNVIIVMPETMSIERRKIIQAYGAQLVLTPGS 120 

++ GD r+EPTSGNTGIGIiAWVG AKGY VI IVMPETMS+ERR+ 1 IQAYGA+LVLTPG+ 
Sbjct: 61 LISPGDVIIEPTSGOTGIGLAWVGAAKGYRVIIVMPETMSLERRQIIQAYGAELVLTPGA 120 

Query: 121 EGMKGAIAKAKEISAEQNAWLPLQFNNQANPEIHEKTTGREIIETFGEKGLDAFIAGVGT 180 

EGMKGAIAKA+ ++ E AW+P+QFNK ANP IHEKTT +EI+E F E LDAF++GVGT 
Sbjct: 121 EGMKGAIAKZ\ETLAIELGAWMPMQFKNPANPS IHEKTTAQE ILEAFKEISLDAFVSGVGT 180 

Query: 181 GGTITGVSRALKKYNPDVAIYAVEADESAILSGEQPGPHKIQGISAGFIPETLATDSYDH 240 

GGT++GVS LKK NP+ IYAVEA+ESA+LSG++PGPHKIQGISAGFIP TL T +YD 
Sbjct: 181 GGTLSGVSHVLKKANPETVIYAVEAEESAVLSGQ3PGPHKIQGISAGFIPNTLDTKAYDQ 240 

Query: 241 IIRVTSDDAIETGRIIGGLEGFLAGISASAAIYAAIEVAKQLGKGKKVLALLPDNGERYL 300 

IIRV S DA+ET R+ G EGFL GIS+ AA+YAAIEVAKQLGKGK VL +LPDNGERYL 
Sbjct: 241 IIRVKSKDALETARLTGAKEGFLVGISSGAALYAAIEVAKQLGKGKHVLTILPDNGERYL 300 

Query: 301 STSLYD 306 

ST LYD 
Sbjct: 301 STELYD 306 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1004 

A DNA sequence (GBSxl064) was identified in S.agalactiae <SEQ ID 3083> which encodes the amino 
acid sequence <SEQ ID 3084>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3666 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07349 GB:AP001519 unknown conserved protein [Bacillus halodurans] 
Identities - 95/204 (47%), Positives = 127/204 (62%) 
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Query: 




Sb 3 ct: 




Query: 




Sbjct: 


64 




122 


Sbjct: 


124 






Sbjct: 





NYKTIKSDGIVEEEIKKSRFICHLKRVESEEEGKNYITQIKKAHYKANHSCSRMVIGEKG 61 
+Y T+K GI E I+KSRFI HL R SEEE +1 QIKK H+ A H+CSA + IGE 
SYYTVKESGIHEISIQKSRFIAHLSRATSEEEAIQFIEQIKKEHWNATHNCSAYLIGEND 63 



H++DDGEPSGTAG+PML VL+K+ L + VAWTRYFGG+KLGAGGLIRAY +V++ 



E +LE V + YV EE 



TITNLTEFYQGKALLTEEGSQIVE 205 

+T G+A T + +E 
YCEWMTNLTNGQAAFTHGAIEYLE 207 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3085> which encodes the amino acid 
sequence <SEQ ID 3086>. Analysis of this protein sequence reveals the following: 



Final Results 

25 bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9153> which encodes the amino acid sequence 

30 <SEQ ID 9154>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 81 - 97 ( 81 - 97) 

35 Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/206 (59%) , Positives = 153/206 (74%) 

Query: 2 NYKTIKSDGIVEEEIKKSRFICHLKRVESEEEGRNYITQIKKAHYKANHSCSAMVIGEKG 61 
++KTIK+ G EE IKKSRFICH+KRV +EE+G+N++ IKK HYKANHSC AM+IG 
45 Sbjct: 8 HFKTIKRSGFFEESIKKSRFICHIKRVSTEEDGI<NFVIiIAIKKEHYKANHSCFAMIIGNNR 67 

Query: 62 DIKRSSDDGEPSGTAGIPMLTVLEKQGLTNVVAVVTRYFGGIKLGAGGLIRAYSGSVANT 121 

I KRSSDDGEPSGTAGIP+L+VLEKQ LTNW WTRYFGGI KLG GGLIRAYS A 
Sbjct: 68 QIKRSSDDGEPSGTAGIPILSVLEKQCLTNVVVA/'VTRYFGGIKLGTGGLIRAYSNMTATA 127 

50 

Query: 122 IKEIGWEVKEQIGIRIQLTYPQYQTFDNFLKEHHLQEFETEFLEAVTCKIYVDPKEFEH 181 

IK G++EVK+QIG+ I L+YPOYQ + N L + L E ET+F + + +Y D + E+ 
Sbjct: 128 IKRFGIIEVKQQIGLEITLSYPQYQLYSNLLDQLALTETETKFSDTIKTTLYCDTERVEN 187 

55 Query: 182 TITNLTEFYQGKALLTEEGSQIVEIP 207 

I LT +Y G+ + GS+++E P 
Sbjct: 188 LIDTLTNYYHGQISCEKIGSKVIEFP 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 1005 

A DNA sequence (GBSxl065) was identified in S.agalactiae <SEQ ID 3087> which encodes the amino 
acid sequence <SEQ ID 3088>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1421 (Affirmative) < suco 

bacterial membrane Certaincy=0 . 0000 (Not Clear) < suco 

10 bacterial outside — - Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

msformation [Bacillus subtilis] 
; m 228/405 (55%) , Gaps = 20/405 (4%) 

Query: 35 YICTRCSSSVAKNCQL- 
YCEC + + 

Sbjct: 58 

Query: 89 KGELTPYQNEVSEELLKGISSKENLLVHAVTGAGKTEMIYHSVAKVIDTGGSVCIASPRI 148 

G+L+ Q + + L++ IS KE LL+ AV GAGKTEM++ + ++ G VCIA+PR 
Sbjct: 118 DGKLSSGQQKAANVLIEAISKKEELLIWAVCGAGICrEMLFPGIESALNQGLRVCIATPRT 177 

Query: 149 DVCLELYKRLSNDFRCA-ITLMHGESPSYQR-SPLTIATTHQLLKFYHAFDLLIVDEVDA 206 

DV LEL RL F+ A 1+ ++G S R SPL I+TTHQLL++ A D++I+DEVDA 
Sbjct: 178 DWLELAPRLKAAFQGADISALYGGSDDKGRLSPLMISTTHQLLRYKDAIDVM1IDEVDA 237 

Query: 207 FPYVDNPILYQGVKOALKENGTSIFLTATSTTELERKVARKELKKLHLARRFHANPLVIP 266 

FPY + L V++A K+N T ++L+AT EL+RK +L + + R H PL P 
Sbjct: 238 FPYSADQTLQFAVQKARKKNSTLVYLSATPPKELKRKALNGQLHSVRIPARHHRKPLPEP 297 

Query: 267 EMVWVSGIQKSLQTQKLPPKLYQLINKQRQTRYPLLLFFPHISEGQVFTEILRQAFPMEK 326 

VW +K L K+PP 4- + I + P+ LF P +S IL +A K 

Sbjct: 298 RFWCGNWKKKLNRTIKIPPAVKRWIEFHVKEGRPVFLFVPSVS ILEKAAACFK 350 



Query: 382 SSLVQI SGRVGRALERPEGLLYFLHDGKS KSMHQAI KEI KNMNHI 426 

S+LVQI+GR GR E +G + + H GK+KSM A K IK UN t 
Sbjct: 411 SALVQIAGRTGRHKEYADGDVIYFHFGKTKSMLDARKHIKEMNEL 455 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3089> which encodes the amino acid 
sequence <SEQ ID 3090>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

3 N-terminal signal sequence 

Transmembrane 304 - 320 ( 303 - 322) 



- Final Results 

bacterial membrane Certainty=0. 2635 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the databases: 

!GB:D56901 involved in transformation [Bacillus subt... 258 le-67 

?GP:AAC44940 GB:U56901 involved in transformation [Bacillus subtilis] 
Identities = 155/435 (35%) , Positives = 249/435 (56%) , Gaps = 20/435 (4%) 

60 

Query: 10 RLLLESQLPDSAKQLAQPLK S WILRGKMI CQRCHYQLDEEA RLPSG 56 
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R LL ++L S + + +K S+ I + + C RC Q D+ 

Sbjct: 22 RHLLRTELSFSDEMIEWHIKNGYITAENSISINKRRYRCNRCG-QTDQRYFSFYHSSGKN 80 

Query: 57 AYYCRFCLVFGRKQSDKLLYAIPPMHFP--KGNYLWGGQLTAYQEMISQQLLINMQNQK 114 

YCR C++ GR + LY+ + K L W G+L++ Q+ + L+ + ++ 
Sbjct: 81 KLYCRS CVMMGRVSEEVPLYSWKEENESNWKS I KLTWDGKLSSGQQKAANVLIEAI SKKE 140 

Query: 115 TTLVHAVTGAGKTEMIYAAIEAVIHTGGWVCIASPRVDVCVEVATRLSQAFS-CSICLMH 173 

L+ AV GAGKTEM++ IE+ +N G VCIA+PR DV +E+A RL AF I ++ 
Sbjct: 141 ELLIWAVCGAGKTEMLFPGIESALNQGLRVCIATPRTDVVLEIAPRLKAAFQGADISALY 200 

Query: 174 AESLPYQR-APIIVATTHQLLKFHKAFDLLIIDEVDAFPFVHNIQLHYAASQALKEGGAK 232 

S R +P++++TTHQLL++ A D++11DEVDAFP+ + L +A +A K+ 
Sbjct: 201 GGSDDKGRLSPLMISTTHQLLRYKDAIDVI'IIIDEVDAFPYSADQTLQFAVQKARKKNSTL 260 

Query: 233 ILLTATSTRTLERKVNKGEWKLTLARRFHNRPLVIPKFIRSFNLFKMIHRQKLPLKILK 292 

+ L+AT + L+RK G++ + + R H +PL P+F+ N K ++R K+P + + 
Sbjct: 261 VYLSATPPKELKRKALNGQLHSTOIPARHHRKPLPEPRFWCG^KKKLNRNKIPPAVKR 320 

Query: 293 YLKKQRKTGYPLLIFLPTIIMAESVTAILKELLPAEQIACVSSQSQNRKEDITAFRQGKK 352 

+++ K G P+ +F+P++ + E AK+ +AV++ ++RKE + FR G+ 
Sbjct: 321 WIEFHVKEGRPVFLFVPSVSILEKAAACFKGV--HCRTASVHAEDKHRKEKVQQFRDGQL 378 

Query: 353 TILITTSILERGVTFPQIDVFVLGSHHRVYSSQSLVQIAGRVGRSIDRPDGTLYFFHEGI 412 

+LITT+ILERGVT P++ VLG+ +++ +LVQIAGR GR + DG + +FH G 
Sbjct: 379 DLLITTTILERGVTVPKVQTGVLGAESSIFTESALVQIAGRTGRHKEYADGDVIYFHFGK 438 

Query: 413 SKAMLIARKEIKEMN 427 

+K+ML ARK IKEMN 
Sbjct: 439 1 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 223/427 (52%) , Positives = 299/427 (69%) 

1 MENYLGRLWTKAQLSEQLRKIAISLPSFIKKGSDYICTRCSSSVAKNCQLPTGNYYCREC 60 

+EN GRL ++QL + +++A L S + IC RC + + +LP+G YYCR C 

4 IENSYGRBLLESQLPDSAKQIAQPDKSWILRGKMICQRCHYQLDEEARLPSGAYYCRFC 63 

61 IVFGRVTSI^NLYYFPQKTFSKTNSLKWKGELTPYQNEVSEELLKGISSKENLLVHAVTG 120 

+VFGR S++ LY P F K N L W G-<-LT YQ +S++LL + +++ LVHAVTG 
64 LVFGRNQSDKLLYAIPPMHFPKGNYLVWGGQLTAYQEMISQQLLINMQNQKTTLVHAVTG 123 

121 AGKTEMIYHSVAKVIDTGGSVCIASPRIDVCLELYKRLSNDFRCAITLMHGESPSYQRSP 180 

AGKTEMIY ++ VI+TGG VCIASPR+DVC+E+ RLS F C+I LMH ES YQR+P 
124 AGKTEMIYAAIEAVIIWGGWCIASPRVDVCTEVATRLSQAFSCSICLMHAESLPYQRAP 183 

181 LTIATTHQLLKFYHAFDLLIVDEVDAFPYVDNPILYQGVKQALKENGTSIFLTATSTTEL 240 

+ +ATTHQLLKF+ AFDLLI +DEVDAFP+V+N L+ QALKE G I LTATST L 
184 IIVATTHQLLKFHKAFDLLIIDEVDAFPFVNNIQLHYAASQALKEGGAKILLTATSTRTL 243 



Sbjct: 

Sbjct 
Query 
Sbjct 



241 ERKVARKELKKLHIARRFHANPLVIPEMVWSGIQXSLQTQKLPPKLYQLINKQRQTRYP 300 

ERKV + E+ KL IARRFH PLVIP+ + + K + QKLP K+ + + KQR+T YP 
244 ERKVNKGE WKLTLARRFHNRPLVI PKF I RS FNLFKMI HRQKLPLKILKYLKKQRKTGYP 303 

301 LLLFFPHISEGQVFTEILRQAFPMEKIGFVSSKST3RLKLVQDFRDNKLSILVSTTILER 360 

LL+F PI + T IL++ P E+I VSS+S +R + + FR K +IL++T+ILER 
304 LLIFLPTIIMAESVTAILECELLPAEQIACVSSQSQNRKEDITAFRQGKKTILITTSILER 363 



: 361 GVTFPSVDVFVIQAHHHLFTKSSLVQISGRVGRALERPEGLLYFLHDGKSKSMHQAIKEI 420 
GVTFP +DVFV+ ++H +++ SLVQI +GRVGR+++RP+G LYF H+G SK+M A KEI 
Sbjct: 364 GVTFPQIDVFVLGSHHRVYSSQSLVQIAGRVGRSIDRPDGTLYFFHEGISKAMLLARKEI 423 

Query: 421 KNMNHIG 427 

K MN+ G 
Sbjct: 424 KEMNYKG 430 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1006 

A DNA sequence (GBSxl066) was identified in S.agalactiae <SEQ ID 3091> which encodes Hie amino 
acid sequence <SEQ ID 3092>. This protein is predicted to be comf operon protein 3 (comFC). Analysis of 
this protein sequence reveals the following: 

N-terminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0 . 0894 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certain ty=0 . 0000 (Not Clear) < suco 

1 5 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MTCLLCHEIDLSQLTFVELMLLKPKQNVICQTCKGSFEALSREMGCQTCCK-QIPQKQCQ 59 

M CLLC +T+ L LLKP + V C +C+ + ++ + C C + Q C+ 

Sbjct: 1 MICLLCDSQFSQDVTWRALFLLKPDEKV- CYSCRSKLKKITGHI - CPLCGRPQSVHAVCR 58 

Query: 60 DCIYWGKKGIEV NHFSLYRYNEAMKKNFSLFKFQGDYLLKDVFTKEIKAALKKY- - 113 

DC W + + + S+Y YN+ MK+ S FKF+GD + + F + + K 

Sbjct: 59 DCEVWRTRIRDSLLLRQNRSVYTYNDMMKETLSRFKFRGDAEIINAFKSDFSSTFSKVYP 118 

Query: 114 -KGYTIVPVPLSHEGYQNRQFNQVIAFLQSAN1PYKNILSKKDGGKQSANNKEERLKQVQ 172 

K + +VP+PLS E 4- R FNQ + + P + L + +■ KQS K ERL 

Sbjct: 119 DKHFVLVPIPLSKEREEERGFNQAHLLAEQjDRPSHHPLIRLISnSIEKQSKKKKTERIjLSEC 178 

Query: 173 QFTLKNEAELGDNLLIVDDIYTTGATIAQIRKLLEEKG-IKNIKSFSLAR 221 

F KN + G N++++DD+YTTGAT+ + Ii EKG ++ SF+L R 
Sbjct: 179 IFDTKNNSAEGMNIILIDDLYTTGATLHFA7ARCLLEKGKAASVSSFTLIR 22S 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3093> which encodes the amino acid 
sequence <SEQ ID 3094>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0763 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 100/222 (45%) , Positives = 139/222 (62%) , Gaps = 2/222 (0%) 

Query: 1 MTCLLCHEIDLSQLTFVELMLLKPKQNVICQTCKGSFFALSREMGCQTCCKQIPQKQCQD 60 

M CLLC +1 + ++ E++ L+ + ICQ C+ SF+ + + + C TCC C+D 
Sbjct: 1 MICLLCQQISQTPISITEIIFLRRISSPICQQCQKSFQKIGKSV-CATCCANSDIIACRD 59 

Query: 61 CIYWGKKGIEVmFSLYRYNESMKKNFSLFKFC£DYI^^ 119 

C+ W KG VNH SLY YN AMK FS +KFQGDYLL+ VF E+ + KY KGY V 
Sbjct: 60 CLECWFJ^KGYNVNHRSLYCTNAAMKAYFSQYKFQGDYLLRIWFAVEIiADVITKYYKGYIPV 119 

Query: 120 PVPLSHEGYQmQFNQVIAFLQSANIPYKNILSKKDGGKQSANNKEERLKQVQQFTLKNE 179 

PVP+S ++ RQFNQV A L++AN+ Y ++ K D QS+ K+ERL + + L 
Sbjct: 120 PVPVSPGCFRERQFNQVSAILFAANVSYLSLFEKLDNTHQSSRTKKERLLVEKSYRLLKV 179 
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Query: 180 AELGDNLLIVDDIYTTGATIAQIRKLLEEKGIKKIK3FSLAR 221 

+ + D +LIVDDIYTTG+TI +RK L + +IKS S+AR 
Sbjct: 180 SNI PDKILIVDDI YTTGSTI IALRKQLAKVANSDIKSLS IAR 221 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1007 

A DNA sequence (GBSxl067) was identified in S.agalactiae <SEQ ID 3095> which encodes the amino 
acid sequence <SEQ ID 3096>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3889 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB91549 GB:AJ249134 hypothetical protein [Lactococcus lactis] 
Identities = 107/185 (57%) , Positives = 140/185 (74%) , Gaps = 3/185 (1%) 



Sbjct: 


1 
1 






Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




178 


Sbjct: 


181 



IVRTK+V LKPMD EEA+LQM++LGHDF+++TDA+ N T+V+Y+R DG G 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3097> which encodes the amino acid 
sequence <SEQ ID 3098>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3751 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/182 (79%) , Positives = 165/182 (89%) 

Query: 1 MIKYSIRGENIEVTEAIREYVETKLSKVEKYFTTEAQELDTRVNLKA/YREKTAKVEVTILI 60 

MIK+SIRGENIEVTEAIR+YVE+KL+K+EKYF + QE+D RVNLKVYRE+++KVEVTI + 
Sbjct: 1 MIKFSIRGENIEVTEAIRDYVESIOjTKIEKYFAKEQEIDARVNLKVYRERSSKVEVTIPL 60 

Query: 61 DSITLRAEDVSQDMYGSIDLWDKIERQIRKNKTKIAKKYREKIPASQVFTTEFEAEPDE 120 

DS+TLRAEDVSQDMYGSIDLWDKIERQIRKNKTKIAKK+REK+P QVFTTEFEAE + 
Sbjct: 61 DSOTLRAEDVSQDMYGSIDLVVDKIERQIRKNKTKIAKKHREKVPTGQVFTTEFEAEEVD 120 
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Query: 121 EAVSQRIWTKmmLKPMDVEE^LQ^LMHDFFIYTDAEDNTimLYKREDGELGLIE 180 

E ++VRTKNV LKPMDVEEA LQMELLGHDFFIYTD+ED TN+LY4REDG LGLIE 
Sbjct: 121 E I PEVQVVRTKNVTLKPMDYEEARLQMEIiLGHDFFIYTDSEDGATNI LYRREDGNLGLI E 180 

5 Query: 181 AK 182 

AK 

Sbjct: 181 AK 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1008 

A DNA sequence (GBSxl068) was identified in S.agalactiae <SEQ ID 3099> which encodes the amino 

acid sequence <SEQ ID 3100>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
15 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0685 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

20 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1009 

A DNA sequence (GBSxl077) was identified in S.agalactiae <SEQ ID 3101> which encodes the amino 
acid sequence <SEQ ID 3102> (sgaT). Analysis of this protein sequence reveals the following: 

Possible site: 41 
30 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 99 - 115 ( 87 - 115) 
INTEGRAL Likelihood = -3.50 Transmembrane 43 - 59 ( 42 - 60) 

Final Results 

35 bacterial membrane Certainty=0 .3378 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03942 GB:AP001507 unknown conserved protein [Bacillus halodurans] 
Identities = 47/111 (42%) , Positives = 76/111 (68%) , Gaps = 5/111 (4%) 

Query: 1 MAIIYLIVAVFAG- -EAYIAKEI SNGVNGLVYALQLAGQFAAGVFVILAGVRLILGE 55 

M I++L+ A+ + A+E+ S + +YA+ + FA G+ V+L GV++ +GE 

Sbjct: 233 MGILFLVGAI ILALKDTQGAQELIAQSGEQSFFIYAI IQSFMFAGGIAWLLGVKMFIGE 292 

Query: 56 IVPAFKGISEKLVPNSKPALDCPIVYPYAPNAVLIGFISKFVGGLVSMIVM 106 

+VPAF GI+ KLVP ++PALD P+V+P APNAV++GF+ FVG L+ ++V+ 
Sbjct: 293 WPAFNGIATKLVPGARPALDAPWFPMAPNAVILGFLGAFVGALIWLWI 343 

There is also homology to SEQ ID 516. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1010 

A DNA sequence (GBSxl078) was identified in S.agalactiae <SEQ ID 3103> which encodes the amino 
acid sequence <SEQ ID 3104>. This protein is predicted to be tryptophanyl-tRNA synthetase (trpS). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0 .2156 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05711 GB:L49336 tryptophanyl-tRNA synthetase [Clostridium 
longisporum] 

Identities = 225/340 (66%) , Positives = 271/340 (79%) , Gaps = 3/340 (0%) 

MTKPI ILTGDRPTGKLHIGHWGSLKNRVLLQNEGSYTLFVFLADQQALTDHAKDPQTIV 6 0 
M K I ILTGDRPTGKLHIGHYVGSLKNRV LQN G Y F+ +ADQQALTD+A++P+ I 
MAKEI1LTGDRPTGKLHIGHYVGSLKNRVQLQNSGDYRSFIMIADQQALTDNARNPEKIR 60 



F SIPAGFL+YPV+QAADITAFKA VPVG EQ PMIEQ REIVRSFN Y +VLVE 



Query: 


1 


Sbjot: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 




241 


Sbjct: 


240 




301 


Sbjct: 


298 



GRLPG DG AKMSKS+ N I+LAD+ D +K+KVMSMYTDPNHIKV +PG 



Q+EGN VF YLD F +D + E MK HY +GGLGDVK K++L +IL+ EL PIR 



E+ KD+ +VY++L++GSEKA+ VAA TL EV+ +G+ YF 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3105> which 
sequence <SEQ ID 3106>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 



Final Results 

50 bacterial cytoplasm Certainty=0. 2737 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 290/340 (85%) , Positives = 316/340 (92%) 



Query: 1 MTKPIILTGDRPTGKLHIGHYVGSLKNRVXiIiQNEGSYTLFVFLADQOALTDHAKDPQTIV 60 
MTKPI ILTGDRPTGKLH+GHYVGSLKNRV LQNE Y +FVFLADQQALTDHAK+ + I 
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Sbjct: 2 MTKPIILTGDRPTGKLHLGHWGSLKNRVFLQNENKYKMFVFLADQQALTDHAKESELIQ 61 

Query: 61 ESIGlTOALDyiAVGLDPNKSTLFIQSQlPEIMLSMYYMI^VSl^LERMPTVKTEIAQK 120 

ESIGNVALDYL+VGLDP +ST+FIQSQIPELAELSMYYMNLVSLARLERKPTVKTEIAQK 
Sbjct: 62 ESIGWALDYLSVGLDPKQSTIFIQSQIPEI^LSMYYMNLVSLARLERNPTVKTEIAQK 121 

Query: 121 GFGES I PAGFLVYPVAQAAD ITAFKANLVP VGTDQKPM IEQTRE IWS FNHAYNCQVLVE 180 

GFGES I P+GFLVYPV+QAAD ITAFKANLVPVG DQKPMIEQTREIVRSFNH Y+ LVE 
Sbjct: 122 GFGESIPSGFLVYPVSQAADITAFKANLVPVGNDQKPMIEQTREIVRSFHHTYHTDCLVE 181 



Query: 241 QIEGNMVFHYLDVFGRDEDQKE1TAMKEHYQKGGLGDVKTKRYLLDILERELSPIRERRL 300 
QIEGNMVFHYLD+F R EDQ +1 AMKEHYQ GGLGDVKTKRYLLDILEREL+PIRERRI, 
• Sbjct: 242 QIEGNMVFHYLDIFARKEDQADIEAMKEHYQIGGLGDVKTKRYLLDILERELAPIRERRL 3 01 

Query: 301 EYAKDMGQVYQMLQKGSEKAQAVAASTLDEVKSAMGLiNYF 340 

EYAKDMG+V+4MLQ+GS+KA+ VAA TL EVKSAMG+NYF 
Sbjct: 302 EYAKDMGEVFRMLQEGSQKARTVAAKTLSEVKSAMGINYF 341 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1011 

A DNA sequence (GBSxl079) was identified in S.agalactiae <SEQ ID 3107> which encodes the amino 
acid sequence <SEQ ID 3108>. This protein is predicted to be carbamate kinase. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA04684 GB:AJ001330 carbamate kinase [Lactobacillus sakei] 
Identities = 199/311 (63%), Positives = 254/311 (80%), Gaps = 3/311 (0%) 

QKIVVALGGNAILSTDASAKAQQEALINTSKSLVKLIKEGHDVIvTHGNGPQVGNLLLQQ 65 
+KIWALGGNAILSTDASA AQ +A+ T K LV +K+G +I++HGNGPQVGNLL+QQ 
RKIWALGGNAILSTDASANAQIKAVKETVKQLVAFVKQGDQLIISHGNGPQVGNLLIQQ 63 

AASDSEKNPAMPLDTCVAMTEGSIGFWLQNALNNELQEQGIDKEVATVVTQVIVDEKDQA 125 
AASDSEK PAMPLDT AM++G IG+W+QNA N L E+G+ +VAT+VTQ IVD KD+A 



F NPTKPIGPF SE +AKKQ + F EDAGRGWR+WPSP+P+GI+EA VI++LV+ 



■ ISAGGGGVPV ++ N L+GVEAVIDKDFAS+ L+ELV AD+ I+LT VDNV+V 



NFNKP+Q+KL V+V++++ YI ++QFA GSMLPK++ AI +V N+P+S+AIITSL+N+ 





6 


Sbjct: 


4 




66 


Sbjct: 


64 




126 


Sbjct: 


124 




185 


Sbjct: 


184 




245 


Sbjct: 


242 




305 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3109> which encodes the amino acid 
sequence <SEQ ID 3110>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal 

• Final Results 

bacterial cytoplasm - 

bacterial membrane - 

bacterial outside - 



■ Certainty=Q. 0013 (Affirmative) < e 

■ Certainty=0 . 0000 (Not Clear) < sue 
• Certainty=0. 0000 (Not Clear) < sue 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 275/312 (88%) , Positives = 295/312 (94%) 

Query: 6 QKIWALGGNAILSTDASAKAQQEALINTSKSLVKLIKEGHDVIVTHGNGPQVGlvILLLQQ 65 

QKIWALGGNAILSTDASAKAQQEALI+TSKSLVKLIKEGH+VIVTHGNGPQVGNLLLQQ 
Sbjct: 4 QKIWALGGNAILSTDASAKAQQEALISTSKSLVKLIKEGHEVIVTHGNGPQVGNLLLQQ 63 

Query: 66 AASDSEKlJPAMPLDTCVAMTEGSIGFWLQNAIAmLQEQGIDKEVATVVTQVIVDEKDQA 125 

AA+DSEKNPAMPLDTCVAMTEGSIGFWL NAL+NELQ QGI KEVA WTQVIVD KD A 
Sbjct: 64 AAADSEKNPAMPLDTCVAMTEGSIGFV&Vt^DNELQAQGIQKEVAAVVTQVIVDAKDPA 123 

Query: 126 FTNPTKPIGPFLSEEDAKKQAQETGSKFKEDAGRGWRKWPSPKPVGIKEASVIRRLVDS 185 

F NPTKPIGPFL+EEDAKKQ E+G+ FKEDAGRGWRKWPSPKPVGXKEA+VIR LVDS 
Sbjct: 124 FENPTKPIGPFLTEEDAKKQMAESGASFKEDAGRGWRKWPSPKPVGIKEANVIRSLVDS 183 

Query: 186 GVWISAGGGGVPVIEDANTKALKGVEAVIDKDFASQTLSELVDADLFIVLTGVDNVFVN 245 

GVW+SAGGGGVPV+EDA +K L GVEAVIDKDFASQTLSELVDADLFIVLTGVDNV+VN 
Sbjct: 184 GVVWSAGGGGVPVvEDATSKTLTGVEAVIDKDFASQTLSELVDADLFIVLTGVDNVYvN 243 

Query: 246 FNKPNQEKLEEVTVSQMKQYITENQFAPGSMLPKVEAAIAFVENKPESRAIITSLENIDN 305 

FNKP+Q KLEEVTVSQMK+YIT++QFAPGSMLPKVEAAIAFVENKP ++AIITSLENIDN 
Sbjct: 244 FNKPDQAKXjEEVTVSQMKEYITQDQFAPGSMLPKVEAAIAFVENKPNAKAIITSLENIDN 303 

Query: 306 VLAQNAGTQIVA 317 

VL+ NAGTQI+A 
Sbjct: 304 VLSANAGTQIIA 315 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1012 

A DNA sequence (GBSxl080) was identified in S.agalactiae <SEQ ID 3111> which encodes the amino 
acid sequence <SEQ ID 3112>. This protein is predicted to be permease (potE). Analysis of this protein 
sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



have an uncleavable N- 
Likelihood =-12 
Likelihood = -8 
Likelihood = -8 
Likelihood = -8 
Likelihood = -7 
Likelihood = -6 
Likelihood = -5 
Likelihood = -4 
Likelihood = -3 
Likelihood = -2 
Likelihood - -1 



■term signal seq 

Transmembrane 4 

Transmembrane 2 

Transmembrane 2 

Transmembrane 1 

Transmembrane 1 

Transmembrane 3 
Transmembrane 

Transmembrane 3 



- 466 ( 441 - 

- 252 ( 231 - 

- 299 ( 277 • 

- 181 ( 153 ■ 

- 145 ( 126 • 

- 412 ( 394 • 

- 61 ( 38 • 

- 351 ( 334 • 

- 29 ( 10 • 



186) 
151) 
415) 



360 - 376 ( 360 - : 
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INTEGRAL Likelihood = -0.53 Transmembrane 207 - 223 ( 207 - 223) 

Final Results 

bacterial membrane Certainty=0. 6052 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10295> which encodes amino acid sequence <SEQ ID 
10296> was also identified. 

10 The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 MEKEKKLGLLPLTMLVIGSLIGGGIFDLMQNMSSRAGLVPMLIAWVITAIGMGTFVLSFQ 64 

M +EKKLGL L LVIGS+IGGG F+L +M+S AG +LI W+IT +GM SFQ 
Sbjct: 1 MAEEKKLGrjFALIALVIGSMIGGGAFNLASDMASGAGAGAILIGWIITGVGMIAIiAFSFQ 60 

Query: 65 NLSEKRPDLTAGIFSYAICEGFGNFMGFNSAVJGYWLSAVJLGNVAYAALLFSSLGYFFKFFG 124 

NL+ KRPDL GIF+YA+EGFG+FMGFNS WGYW +A LGNVAY LLFS++GYF FG 
Sbjct: 61 NLTTKRPDLDGGIFTYAREGFGHFMGFNSGWGYWFAALLGNVAYGTLLFSAIGYFIPAFG 120 

Query: 125 NGNNI IS I IGAS IVIWWHFLILRGVNTAAFINTI VTFAKLVPVI IFL1 SALLAFKFNI F 184 

+G NI SIIGAS+++W VHFJjILRGV +AA IN I T +KLVP+ F+I+ + F ++F 
Sbjct: 121 DGQNIASIIGASVILWCVHFLILRGVQSAAMINLITTISKLVPIFAFIIAIIFVFHLDLF 180 

Query: 185 SLDIWGNGLH - QS I FNQVNSTMKTAVWVFIG IEGAWFSGRAKKHSD IGKAS I IiALFTM I 243 
+ D WG GL SI QV STM VWVF GIEGAV+FS RAKK SD+GKA+++ L +++ 
■ Sbjct: 181 TNDFWGKGLSLGSIGTQWSTMLVTVWVFTGIEGAVLFSSRAKKSSDVGKATVIGLISVL 240 

Query: 244 SLYVLISVLSLGIMSRPEI^ANLKTPAmYVLEKAVGHWGAILVNLGVIISVFGAILAWTL 303 

+YV+I ++LSLG+M++ LA L P+MA ++E VG WGA+L+NLG+IISV GA LAWTL 
Sbjct: 241 VIYVMITMLSLGVNMQQNIAELPNPSMAAIMEHIVGKKGAVLINLGLIISVLGAWLAWTL 300 

Query: 304 FAAELPYQAAKEGAFPKFFAKENKNKAPINSLLVTNLCVQAFLITFLFTQSAYRFGFALA 363 

FA EtiP AA+EG FPK+F KENKN AP N+L +TN +Q FL+TFL + +AY+F F+LA 
Sbjct: 301 FAGELPLIAAREGVFPKWFGKENKNGAPTNALTLTNAIIQLFLLTFLISDAAYQFAFSLA 360 

Query: 364 SSAILI PYAFTALYQLQFTLREDKSTPGHQKNLI IGILATIYAVYLI YAGGFDYLLLTMI 423 

SSAILIPY F+ LYQL+++ + P KNLIIGI+A+IY V+L+YA G DYLLLTMI 
Sbjct: 361 SSAILI PYLFSGLYQLKYSWLHKE - - PNRGKNLI IGIIAS I YG VWLVYAAGLDYLLLTMI 418 

Query: 424 AYTLGMILYIKI1RKDDKLPIFVGYEKISAIVILALCLLCIIEIMTGQIDI 473 

Y G++++ +RK + P+F E + A +IL L ++ +1 + +G I I 
Sbjct: 419 LYAPGILVFRAVRKGKEGPVFNKAELLIAALILVLAVIAVIRLASGSISI 468 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3113> which encodes the amino acid 
sequence <SEQ ID 31 14>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.52 Transmembrane 331 - 347 ( 327 - 354) 

INTEGRAL Likelihood = -9.50 Transmembrane 390 - 406 ( 383 - 410) 

INTEGRAL Likelihood = -8.12 Transmembrane 50- 66 ( 45- 75) 

INTEGRAL Likelihood = -7.59 Transmembrane 235 - 251 ( 234 - 262) 

INTEGRAL Likelihood = -6.21 Transmembrane 133 - 149 ( 128 - 151) 

INTEGRAL Likelihood = -5.84 Transmembrane 162 - 178 ( 153 - 183) 

INTEGRAL Likelihood = -2.02 Transmembrane 105 - 121 ( 105 - 121) 

INTEGRAL Likelihood = -1.49 Transmembrane 414 - 430 ( 414 - 431) 

INTEGRAL Likelihood = -0.69 Transmembrane 280 - 296 ( 280 - 296) 

INTEGRAL Likelihood = -0.59 Transmembrane 21 - 37 ( 21 - 37) 

INTEGRAL Likelihood = -0.32 Transmembrane 205 - 221 ( 205 - 222) 

Final Results 

bacterial membrane Certainty=0. 5607 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

5 >GP:AAB85052 GB:AE000837 cationic amino acid transporter related 

protein [Methanobacterium thermoautotrophicum] 
Identities = 108/422 (25%) , Positives = 213/422 (49%) , Gaps = 36/422 (8%) 

Query: 26 INAVIGSGI FLLPRAIYKGLGPAS IAVMFGTAILTIMIAVCFAEVSGYFGKNGGAFQYSK 85 
10 + ++G+ I+++ LGPASI ++ +++A+ F+E S + GG + Y+ 

Sbjct: 19 VGTIVGADIYIVAAYGAGSLGPASILAWLIAGLMALIIALVFSEASAMLPRTGGPYVYAG 78 

Query: 86 RAFGDFIGFOTGFLGWTVTIFAWAAMAAGFARMFIITFPAFEGWHIPL SIGL 137 

A G F GF GW++ + +W A+A +F + F + + JPh + 

15 Sbjct: 79 EALGRFTGF ITGWSLWVSSWVAIA VFPLAFIYYLEYFIPLDPPAEAVIKVLF 130 

Query: 138 IILLSLMNIAGLiKTSKIVTITATIAICLIPI'UAFCACTLFFIKNG LPNFTPFVQLEP 193 

1+ L+++NIAG+ + V TI K+ P++ F + + N+TP + 

Sbjct: 131 ILSLTIINIAGVGRAGKVlSroiLTILKVAPVLLFAvLGAIHLALNPGLLVSNYTPAAPMG- 189 

20 

Query: 194 GTNLLGAI SNTAVYI FYGF IGFETLS I VAGEMRDPEKNVPRALLGS I S I VS VLYMLI IGG 253 

LGA-h V +F+ 4-+GFE +++ A E+RDPE+ +P ++ + V++ Y+L 
Sbjct: 190 LGALGTVTVLVFWAYVGFEL VTVPADE VRDPERTI PLS I TLGMI FVTLFYI LTNAV 245 

25 Query: 254 TIAMLGSQIMMM-APVQDAFVKMIGPAGAVMVSIGALISITGLNMGESIMVPRYGAAIA 312 

+ ++ +++ ++ AP+ A ++G GA +++ GA+ SI G + R A++ 

Sbjct: 246 ILGLVPWRVLRSSTAPLTVAGYSLMGGIGALILTAGAVFSIAGSEEAGMLTTARLLFAMS 3 05 

Query: 313 DEGLLPAAIAKQNQN-GAPLVAILVSGAIAIVLLLTGSFESLAKLSWFRFFQYIPTALA 371 
30 ++G LP +++ ++ G P ++ILV A++ LTG+ L +LSW Y T ++ 

Sbjct: 306 EDGFLPGFLSRVHRRFGTPHMSILVQNLTALLAALTGTVSGLIELSWTLLLPYAVTCIS 365 

Query: 372 VMK1RKDDPDAWIFROTFGPIIPIIAVIVSLVMIWGDNPMNFVYGAVGVIIASSVYYLM 431 
+ LR+ D P+ +L V+V + ++ P +G + +I++ + YL+ 

35 Sbjct: 366 LAILRRRDGSGI PLKSVLG VLVCI YLLMNTTPSTTAWGLL - L ILSGAPLYLI 416 

Query: 432 HG 433 

G 

Sbjct: 417 FG 418 

40 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 104/368 (28%) , Positives = 162/368 (43%) , Gaps = 32/368 (8%) 

Query: 1 MRYKMEKEKKLGLLPLTMLVIGSLIGGGIFDLMQKMSSRAGLVPMLIAWVI-TAIGMGTF 59 
45 M + +4- K L T+ I ++IG GIF L + + GL P IA + TAI 

Sbjct: 6 MNEQEREQAKFSLSGATLYGINAVIGSGIFLLPRRIYK- -GLGPASIAVMFGTAILTIML 63 

Query: 60 VLSFQNLSEKRPDLTAGIFSYAKEGFGNFMGFNSA WGYWLSAWLGNVAYAALLFSSL 116 

+ F +S G F Y+K FG+F+GFN W + AW A A +F 

50 Sbjct: 64 AVCFAEVSGYFGK-NGGAFQYSKPAFGDFIGFNVGFLGWTVTIFAWAAMAAGFARMFIIT 122 

Query: 117 GYFFKFFGNGNNIISIIGASIVIWVvHFLILRGvKTARFINTIVTFAKLVPVIIFLISAL 176 

F+ G +1 IG I++ +++ + G+ T+ + T AKL+P++ F L 
Sbjct: 123 FPAFE GWHIPLSIGLIILLSLMN IAGLKTSKIVTITATIAKLIPIVAFCACTL 175 

55 

Query: 177 LAFK FNIFSLDIWGNGLHQSIFNQWSTMKTAVIWFIGIEGAWFSGRAKKHSDI 231 

K F F G L +1 N TAV++F G G S A + D 

Sbjct: 176 FFIKNGLPNFTPFVQLEPGTNLLGAISN TAVYIFYGFIGFETLSIVAGEMRDP ,228 

60 Query: 232 GKASILALFTMISLYVLISVLSLG IMSRPELAMLKTPAM-AYVLEKAVGHWGAILVN 287 

K AL IS+ ++ +L +G M ++ P A+V K +G GA +V+ 

Sbjct: 229 EKNVPRALLGSISIVSVLYMLI IGGTIAMUSSQIMMTNM'VQDAFV- -KMIGPAGAWMVS 286 

Query: 288 LGVIISVFGAIIAWTLFAAELPYQAAKEGAFPKPFAJCENKNKAPINSLLVTNLCVQAFLI 347 
65 +G +IS+ G + ++ A EG P AK+N+N AP+ ++LV+ L+ 

Sbjct: 287 IGALISITGUSMGESIMVPRYGAaiADEGLLPAAIARQNQNGAPLVAILVSGAIAIVLLL 346 



WO 02/34771 



-1125- 



PCT/GB01/04789 



Query: 348 TFbFTQSA 355 

T F A 
Sbjct: 347 TGSFESLA 354 

A further related DNA sequence was identified in S.pyogenes 
acid sequence <SEQ ID 9080>. Analysis of this protein sequence 

Possible site: 60 



<SEQ ID 9079> which encodes the amino 
reveals the following: 



d N- terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 93 ( 72 - 

- 295 ( 274 - 

- 219 ( 199 - 

- 190 ( 171 - 

- 452 ( 432 - 

- 345 ( 324 - 

- 418 ( 396 - 

- 476 ( 456 - 

- 395 ( 377 - 

- 64 ( 48 - 

- 259 ( 243 - 

- 139 ( 123 - 



--- Certainty=0. 4970 (Affirmative) . 
bacterial outside --- Certainty=0. 0000 (Not Clear) < i 
bacterial cytoplasm Certair,ty=0 . 0000 (Not Clear) < i 

An alignment of the GAS and GBS sequences follows: 



= 107/250 (42%) , 





143 


Sbjct: 


95 


Query: 


198 


Sbjct: 


155 




250 


Sbjct: 


214 




315 




272 




375 


Sbjct: 


331 



WG +L L N Y 



I 1+ +V++ V ++L 



+W A V++ I+++ 



Based on this analysis, it was predicted that ti 
vaccines or diagnostics. 



e proteins and their epitopes could be useful antigens for 



Example 1013 

A DNA sequence (GBSxl081) was identified in S.agalactiae <SEQ ID 3115> which encodes the amino 
acid sequence <SEQ ID 3116>. This protein is predicted to be unnamed protein product (argF). Analysis of 
this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0. 3757 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty= 0.0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3117> which encodes the amino acid 
sequence <SEQ ID 31 18>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

ive no N- terminal signal sequence 

171 - 187 ( 171 - 188) 

Final Results 

bacterial membrane --- Certainty=0. 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12563 GB:Z99108 similar to metabolite transporter [Bacillus subtilis] 
Identities = 190/467 (40%) , Positives = 284/467 (60%) , Gaps = 13/467 (2%) 

++FRKK S + R DL LG+G ++GTGIF +TG AA AGPAL I 

Sbjct: 3 SLFRIOCPLETLSAQSKSKSLARTLSAFDLTLLGIGCVIGTGIFVITGTVAATGAGPALII 62 

Query: 80 SIIISAIAIGIIALFYABFASRMPSNGGAYSYVYATLGEFPAWLVGWYIIMEFLTAISSV 139 

S I++ +A + A YAEF+S +P +G YSY Y TLGE A+L+GW +++E++ A+S+V 
Sbjct: 63 SFIIAGIAOU^AAFCYAEFSSSIPISGSVYSYSYVTLGELLAFLIGWDLMLEYVIALSAV 122 

Query: 140 AVGWGSYLKGLIiANyNIvXiPNALNGTFNLKNGTYIDILPvLVMFFVTGIVLMNSKLALRF 199 

A GW SY + LLA +N+ +P AL G G ++ +++ +T IV K + RF 

Sbjct: 123 ATGWSSYFQSLLAGFNLHIPAALTGAPGSMAGAVFNLPAAVIILLITAIVSRGVKESTRF 182 

Query: 200 NSFLVILKFSALALFIWGIFFIDHNNWSHFAPYGVGQITGGKTGIFAGASVMFFAFLGF 259 

N+ +V++K + + LFI VGI ++ +NWS F P+G+ G+ A+ +FFA+LGF 

Sbjct: 183 NNVIVLMKIAIILLFIIVGIGYVKPDNWSPFMPFGM KGVILSAATVFFAYLGF 235 

Query: 260 ESISMAVDEVKEPQKTIPKGIILSLIIVTALYIVVTTILTGIvHYTKlNVPDAVAFALRN 319 

+++S A +EVK PQK +P Gil +L + T LYI V+ +LTG++ Y KLNV D V+FAL+ 
Sbjct: 236 DAVSNASEE VKNPQKNMPVG 1 1 SAIAVCTVLY lAVSLVLTGMMPYAKLNVGDPVS FALKF 295 

Query: 320 IRLYWAADYVSIVAILTLITVCISMTYALARTIYSISRDGIjLPKSLYTLTKKNKVPQNAT 379 

+ A +S+ AI+ + TV +++ YA R +++SRDGLLP + KPT 

Sbjct: 296 VGQDAVAGI ISVGAI IGITTVMIALLYAQVRLTFAMSRDGLLPGLFAKVHPSFKTPFRNT 355 

Query: 380 LvTGLIAMICAGI FPLSSIAEFVNI CTLAYLI ILSGAI IKLRRIEGEPKANEFKTPLVPF 439 

+TG++A AG L +LA VN+ TLA ++S A+I LR+ E KA+ F+ P VP 
Sbjct: 356 WLTGIVAAGIAGFINLGTIAHLVNMGTLAAFTVISIAVIVLRKKHPEIKAS-FRVPFVPV 414 

Query: 440 LPMLAIIICLSFMSQYKAFTOIAFAIATIIGTLIYLAYGYTHSIENK 486 
+P+++ ICL FM TW++F I +GTL+Y Y HS+ NK 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 312/337 (92%) , Positives = 324/337 (95%) 

Query: 1 MTQVFQGRSFrAEKEFSREEFEYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 60 

MTQVFQGRSFLAEKDF+R E EYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 
Sbjct: 1 MTQVFQGRSFLAEKDFTRAELEYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 60 

Query: 61 RAAFTTAAIDLGAHPEYLGANDIQIX3KKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 120 

RAAFTTAAIDLGAHPEYLGANDIQLGKKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 
Sbjct: 61 RAAFTTAAIDLGAHPEYLGANDIQLGKKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 120 
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Query: 121 FSGVPVVraGLTDETOPTQMrJUDYLTIKENFGKLEGITLOTCGDGRMNVANSLLVAGTLMG 180 

FSGVPVWNGLTDEWHPTQMLADY T+KENFGKLEG+TLVYCGDGRNNVANSLLV G ++G 
Sbjct: 121 FSGVPVWGLTDEWHPTQMIJffiYFTVKEI^GKLEGLTLOTCGDGRmVANSLLVTGAILG 180 

Query: 181 OTVHIFSPKELFPAEEIVKLMEYAKESGAHVLVTOmTO^ 240 

VNVHIFSPKELFP EEIV LAE YAKESGA +L+T++ DEAVKGADV YTDVWVSMGEED 
Sbjct: 181 VNVHIFSPKELFPEEEIVTLAEGYAKESGARILITEDADEAVKGADVL YTDVWVSMGEED 240 

Query: 241 KFKERVELLQPYQW^LIKKANNDNLIFLHCLPAFHDTNTVYGKDVAEKFGVKEMEVTD 300 

KFKERVELLQPYQVNM+L++KA ND LIFLHCLPAFHDTNTVYGKDVAEKFGVKEMEVTD 
Sbjct: 241 KFKERVELLQPYQVNMDLVQKAGNDKL I FLHCLPAFHDTNTVYGKDVAEKFGVKEMEVTD 300 

Query: 301 EVFRSKYARHFDQAENRMHTIKAVMAATLGNLFIPKV 337 

EVFRSKYARHFDQAENRMHTIKAVMAATLGNLFIPKV 
Sbjct: 301 EVFRSKYARHFDQAENRMHT1KAVMAATLGNLFIPKV 337 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1014 

A DNA sequence (GBSxl082) was identified in S.agalactiae <SEQ ID 3119> which encodes the amino 
acid sequence <SEQ ID 3120>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0456 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10921> which encodes amino acid sequence <SEQ ID 
10922> was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3121> which encodes the amino acid 
sequence <SEQ ID 3122>. Analysis of this protein sequence reveals the following: 

Possible site: 61 



bacterial membrane Certainty=0 .3166 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 65/113 (57%) , Positives = 83/113 (72%) 

Query: 31 MEEEFDDNDEQDTI YAVLYDGKQPVSTGRFLPETQTEARLTRIATLKGYRGNGYGTKI 1 1 90 

M ++FD NDE T+YAV+YD QPVSTG+FL ET+ EARLTRI TL Y G GYG K+ 
Sbjct: 1 MADKFDANDETRTVYAVVYDNDQPVSTGQFLAETKIEARLTRIVTIADYCGCGYGAKvTE 60 

Query: 91 ALENYAKENGYHYLTIHAELTAKDFYQTLGYQATGNIYMEDGEACQTLEKYLI 143 

ALE Y + G++ LTIH+ELTA+ FY+ LGYQ+ G +EDGE CQ+L K ++ 
Sbjct: 61 ALETYTRREGFYQLTIHSELTAQTFYWLGYQSYGPKCLEDGEYCQSLAKTIL 113 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1015 

A DNA sequence (GBSxl083) was identified in S.agalactiae <SEQ ID 3123> which encodes the amino 

acid sequence <SEQ ID 3124>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2160 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3125> which encodes die amino acid 

sequence <SEQ ID 3126>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 375/411 (91%) , Positives = 395/411 (95%) , Gaps = 1/411 (0%) 

25 Query: 1 MTQTHPIHVFSEIGKLKKVMLHRPGKEIENLMPDYLERLLFDDIPFLEDAQKEHDAFAQA 60 

MT PIHV+SEIGKLKKV+LHRPGKEIENLMPDYLERLLFDDIPFLEDAQKEHDAFAQA 
Sbjct: 1 MTAQTPIHVYSEIGKLKKVLLHRPGKEIENLMPDYLERLLFDDIPFLEDAQICEHDAFAQA 60 

Query: 61 LRNEGVEvI^yLENLAAESLTNQEIREQFIDEYIGESUWRGRATKKAIRELLLNIKDNKEL 120 
30 LR+EG+EVLYLE LAAESL EIRE FIDEY+ EAN+RGRATKKAIRELL+ I+DN+EL 

Sbjct: 61 LRDEGIEVLYLETLAAESLVTPEIREAFIDEYLSEANIRGRATKKAIRELL1J1AIEDNQEL 120 

Query: 121 IEKTMAGIQKSELPEIPSSEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIGNGVSLNHM 180 
IEKTMAG+QKSELPEIP+SEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIG GVSLNHM 
35 Sbjct: 121 IEKTMAGVQKSELPEIPASEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIGTGVSLNHM 180 

Query: 181 FSETRNRETLYGKYIFTHHPEYGG-KVPMVYEREETTRIEGGDELVLSKDVLAVGISQRT 239 

FSETRNRETLYGKYI FTHHP YGG KVPMVY+R ETTRIEGGDELVLSKDVLAVGISQRT 
Sbjct: 181 FSETRNRETLYGKYIFTHHPIYGGGKVPMVYDRNETTRIEGGDELVLSKDVLAVGISQRT 240 

40 

Query: 240 DAASIEKLLWIFKQNLGFKKVIAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDIjRV 299 

DAASIEKTjLVNI FKQNLGFKKVLAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDLRV 
Sbjct: 241 DA&SIEKIiWIFKQNLGFKKVLAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDLRV 300 

45 Query: 300 YSVTYENQDLHIEEEKGDIiADLLAKNLGVEKVELIRCGGDNLVAAGREQWNDGSNTLTIA 359 

YSVTY^-N+4LHI EEKGDIA+LLA NLGVEKV+LIRCGGDNLVAAGREQWNDGSNTLTIA 
Sbjct: 301 YSVTYDlSIEELHIVEEKGDIiAELIAANLGvEKVDLIRCGGDNLVAAGREQWNDGSNTLTIA 360 

Query: 360 PGWIVYNRNTITNAILESKGLKLIKINGSELVRGRGGPRCMSMPFEREDL 410 
50 PGW+VYNRNTITNAILESKGLKLIKI+GS3LVRGRGGPRCMSMPFERED+ 

Sbjct: 361 PGVWVYNRNTITNAILESKGLKLIKIHGSELVRGRGGPRCMSMPFEREDI 411 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1016 

A DNA sequence (GBSxl084) was identified in S.agalactiae <SEQ ID 3127> which encodes the amino 
acid sequence <SEQ ID 3128>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3162 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8703> which encodes amino acid sequence <SEQ ID 8704> 
was also identified. This protein has an RGD motif and has homology with the following sequences in the 
GENPEPT database. 



Query: 35 IQTYRKAYQTFKTK-KGARSSIEALLKRVNSGNEITSINPLVDIYNAASLRFGLPIGAED 93 

+ + +A++ F K + S EAL KR + SI+P+VD+YNA S++F +P+G E+ 

Sbjct: 63 LAAWAEAFRRFGAKPQRTPCSAFALRKRALRDGGJjPSIDPVVDLYNAISVQFAIPVGGEN 122 

Query: 94 SDTFRGDLKLTITNGGDEFYLI- -GEDFNRPTLSGELAYVDDVGAVCRCFNWRDGKRTMI 151 

+ G 4L 4- +G + F + GE + GE+ + DD+G CR +NWR G RT + 

Sbjct: 123 LAAYAGPPRLWADGSETFDTLKNGEALDESPDPGEVVWRDDLGVTCRRWNWRQGVRTRL 182 

Query: 152 TDNTQNAFLVIE 163 

+ + + ++E 
Sbjct: 183 DASARRMWF ILE 194 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3129> which encodes the amino acid 
sequence <SEQ ID 3130>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm — certainty=0 . 0700 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities - 127/199 (63%) , Positives = 155/199 (77%) 

Query: 8 ELKQLLSDSHSLAKKYLQEKEFSQNRVIQTYRKAYQTFKTKKGARSSIEALLKRWSGNE 67 

++KQLL+DSH LAK YL FS N+V+Q YRKAYQ FKTKKGARSSIEALLKRV++G 
Sbjct: 36 DWQLLADSHEIiAICAYLTADNFSDNQWQVYRKAYQHFKTKKGARSSIEALLKRVSNGQS 95 

Query: 68 ITSINPLVDIYNAASLRFGLPIGAEDSDTFRGDLKLTITNGGDEFYLIGEDFNRPTLSGE 127 

I SINPLVDIYNAASLRFGLP GAEDSD+F GDL+LTIT+GGD+FYLIG+ N PTL E 
Sbjct: 96 IPSINPLvDIYNAASLRFGLPAGAEDSDSFIGDLRLTITDGGDDFYLIGDADNNPTLPNE 155 

Query: 128 lAYVDDVGAVCRCFNWPJDGKRTMITDNTQNAFLVIELIDNGREIIFKFALDFIATNTNRF 187 

L Y DD+GA CRC NWRDG+RTM+T++T+NAFL+IE +D + +EAL FI + + 
Sbjct: 156 LCYKDDIGAFCKCLNWRDGERTMVTEHTKNAFLIIEALDQEGQNRLQEALKFIEGSAKMY 215 

Query: 188 LKAKTOTIILDKEHSEITL 206 

L A T +LDK++ + h 
Sbjct: 216 LHAITSVHVLDKDNPHVPL 234 

SEQ ID 8704 (GBS298) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 2; MW 29kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 5; MW 54kDa). 
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The GBS298-GST fusion product was purified (Figure 203, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 297), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1017 

A DNA sequence (GBSxl085) was identified in S.agalactiae <SEQ ID 3131> which encodes the amino 
acid sequence <SEQ ID 3132>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3770 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines o 



Example 1018 

A DNA sequence (GBSxl086) was identified in S.agalactiae <SEQ ID 3133> which encodes the amino 
acid sequence <SEQ ID 3134>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4263 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB95946 GB:Y17554 Crp/Fnr family protein [Bacillus 
licheniformis] 

Identities = 85/214 (39%) , Positives = 126/214 (58%) , Gaps = 14/214 (6%) 

Query: 11 RQLDDFKHFTIEQFDHIVSHIKHRTALKNHTLFFEGDYREKLFLIQSGHVKIEQSDASGS 70 

R L+D K F I R+ K LF E D RE+++L+ G +K+E+S+ +GS 

Sbjct: 22 RDLEDMKQF IYWRSYHKGQILFMEDDPRERMYLLLDGFIKLEKSNEAGS 70 

Query: 71 FIYTDYWQGTVFPYGGLFLDDDYHFSAVAITDIEYFSLPMALYEEYSLQNINQMKHLCR 130 

YTDYVR T+FP+GGLF D+ YH++A A+TDIE + +PM ++E+ N N + + 
Sbjct: 71 MFYTDYVRPHTLFPFGGLFRDEHYHYAAEaLTDIELYYIPMNIFEDLVRDNKNLtiYDira 13 0 

Query: 131 KYSKLIjRVHEIRLRNMVTSSASMRVIQSIATIj LLQVPTERGHLPFPITTIEIANMSG 187 

S +L +HE RL+ + S A RV Q++ L L Q + + PIT EIA +SG 
Sbjct: 131 HLSDILALHEERLKRITLSHAHDRVTQAIYYLTESLGQKESNSTVINCPITAaEIAKISG 190 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3135> which encodes the amino acid 
sequence <SEQ ID 3136>. Analysis of this protein sequence reveals the following: 

<I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4478 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/224 (58%) , Positives = 180/224 (80%) 

Query: 1 MITKEQYFYFRQMDFKHFTIEQFDHIVSHIKHRTALKNHTLFFEGDYREKLFLIQSGHV 60 

+1 +E Y Y R+L+DF++F+IEQFD IV ++ R A K4H LFFEGD R+KLFL+ SG+ 
Sbjct: 1 VIRREDYQYLRKLNDFRYFSIEQFDKIVGQMEFRKAKKDHILFFEGDKRDKLFLVTSGYF 60 

Query: 61 KIEQSDASGSFIYTDYTOQGTOFPYGGLFLDDDYHFSAVAITDIEYFSLPMALYEEYSLQ 120 

K+EQSD SG+F+YTD++R GT+FPYGGLF DD YHFS VA+TD+ YF P+ L+E+YSL+ 
Sbjct: 61 KVEQSDQSGTFMYTDFIRHGTIFPYGGLFTDDYYHFSWAMTDVTYFYFPVDLFEDYSLE 120 

Query: 121 NINQMKHLCRKYSKLLRVHEIRLRNMVTSSASMRVIQSLATLLLQVPTERGHLPFPITTI 180 

N QMKHL K SKLL +HE+R+RN++TSSAS RVIQSLA LL+++ + LPF +TT 
Sbjct: 121 mLQMKHLYSKMSKLLEMEt,R\mNLITSSASSRVIQSLAILLVEMGKDSDTLPFQLTTT 180 

Query: 181 E I ANMSGTTRE WSHVLKELRQKD I VEMKGIC!<LLYNNI<NYFICKF 224 

+ IA +SGTTRETVSHVL++L++++++ +KGK L Y +K+YF ++ 
Sbjct: 181 DIAQISGTTRETVSHVLRDLKKQELITIKGKYLTYLDKDYFLQY 224 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1019 

A DNA sequence (GBSxl087) was identified in S.agalactiae <SEQ ID 3137> which encodes the amino 
acid sequence <SEQ ID 3138>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1643 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2161> which encodes the amino acid 
sequence <SEQ ID 2162>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1201 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 461/493 (93%) , Positives = 478/493 (96%) 
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Sbjct: 1 MSNWDTKFLKKGYTFDDVLLIPAESHVLPNEVDLKTKLM)NLTLNIPIITAZMDTVTGSK 60 

Query: 62 miAIARAGGLCSIIHKISMSIVDQAEEWKVKKSENGVIIDPFFLTPDNTVSEAEELMQNY 121 

MAIAIARAGGLG+IHKNMSI +QAEEVRKVKRSENGVI IDPFFLTP++ VSEAEELMQ Y 
Sbjct: 61 MAIAIAI^GGLGVIHKNMSITEQAEEVRKVKRSENGVIIDPFFLTPEHKVSEAEELMQRY 120 

Query: 122 RISGVPIVETLENRKLVGIITNRDMRFISDYKQLISEHKTSQNLVTAPIGTDLETAERIL 181 

RISGVPIVETIi NRKLVGI ITNRDMRF I SDY ISEHMTS++LVTA +GTDLETAERIL 
Sbjct: 121 RISGVPIVETIANRKLVGIITNRDMRFISDYNAPISEHMTSEHLVTAAVGTDLETAERIL 180 

Query. 182 HEHRIEKLPLVDDEGRtSGLITII<IlIEKVIE??KAAKDEFGRIiLVAGAVGVTSDTFERAE 241 

HEHRIEKLPLVD+ GRLSGLITIKDIEKVIEFP AAKDEFGRLLVA AVGVTSDTFERAE 
Sbjct: 181 HEHRIEKLPLVDNSGRLSGLITIKDIEKVIEFPHAAKDEFGRLLVAAAVGVTSDTFERAE 240 

Query: 242 ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 301 

ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 
Sbjct: 241 ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 300 

Query: 302 KVGIGPGSICTTRWAGVGVPQITAIYDAAAVAREYGKTIIADGGIKYSGDIVKALARGG 361 

KVGIGPGSICTTRWAGVGVPQ+TAIYDAAAVAREYGKTIIADGGIKYSGDIVKAIiAAGG 
Sbjct: 301 KVGIGPGSICTTRWAGVGVPQVTAIYDAAAVAREYGKTIIADGGIKYSGDIVKALAAGG 360 

Query: 362 NAWLGSMFAGTDEAPGETEIFQGRKFKTYRGMGSIAAMKKGSSDRYFQGSVNEANKLVP 421 

NAVMLGSMFAGTDEAPGETEI+QGRKFKTYRGMGSIAAMKKGSSDRYFQGSVNEANKLVP 
Sbjct: 361 NAVMLGSMFAGTDEAPGETEIYQGRKFKTYRGMGSIAAMKKGSSDRYFQGSVNEANKLVP 420 

Query: 422 EGIEGRVAYKGSVADIVFQMLGGIRSGMGYVGAANIKELHDNAQFVEMSGAGLKESHPHD 481 

EGIEGRVAYKG+ +DIVFQMLGGIRSGMGYVGA +I+ELH+NAQFVEMSGAGL ESHPHD 
Sbjct: 421 EGIEGRVAYKGAASDIVFQMLGGIRSGMGYVGAGDIQELHENAQFVEMSGAGLIESHPHD 480 

Query: 482 VQITNEAPNYSVH 494 

VQITNEAPNYSVH 
Sbjct: 481 VQITNEAPNYSVH 493 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1020 

A DNA sequence (GBSxl089) was identified in S.agalactiae <SEQ ID 3139> which encodes the amino 
acid sequence <SEQ ID 3140>. This protein is predicted to be MutR. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1841 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD04237 GB:AF007761 MutR [Streptococcus mutans] 
Identities = 51/215 (23%) , Positives = 102/215 (46%) , Gaps = 9/215 (4%) 

Query: 

Sb j ct : 

Query: 65 SILDSKVKAGTSOTDLEQLTLLESYRDNED1MR1FSFQKQQSCDRIESNVLKIIAKLFIS 124 

S ++ + ++L L4+ +D + + +1 + + + K++ K + 

Sbjct: 66 SYAFTQYQESDLFKTGKKLVELQTKKDIKGLKKILI03YPDTETYNVYNRLNKLVIKAaVY 125 

Query: 125 NLGLNiyKLPQDEINLVVTYLNGvTQYNDFYFKVICYFQDILPED — VILNKI SNMT 178 
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+L + + +E + +YL + ++ ++ + IL +D V L K + 

Sbjct: 126 SLDSSFEITNEEKEFLTSYLYAIEEWTEYELYLFGNTLFILSDDDLVFLGKAFVERDKLY 185 

Query: 179 KEQLPYSKSLVNLLIKQVIIALEKDSVDKAIVFAD 213 

+E + K +LI ++I +E S A F + 
Sbjct: 186 RELSEHKKRAELVLINLILILVEHHSFYHAQYFIE 220 

There is also homology to SEQ ID 628. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1021 

A DNA sequence (GBSxl090) was identified in S.agalactiae <SEQ ID 3141> which encodes the amino 
acid sequence <SEQ ID 3142>. Analysis of this protein sequence reveals the following: 



Possible site: 15 



20 



Seems to have a cleavable N-tei 


-m signal seq. 










INTEGRAL 


Likelihood =-10.77 


Transmembrane 


269 


285 


265 


287) 


INTEGRAL 


Likelihood = -6.90 


Transmembrane 


33 


49 


31 


51) 


INTEGRAL 


Likelihood = -6.79 


Transmembrane 


182 


198 


176 


200) 


INTEGRAL 


Likelihood = -6.37 


Transmembrane 


117 


133 


113 


135) 


INTEGRAL 


Likelihood = -5.57 




240 


256 


232 


259) 


INTEGRAL 


Likelihood = -3.40 


Transmembrane 


223 


239 


220 


239) 


INTEGRAL 


Likelihood = -0.96 


Transmembrane 


56 


72 


55 


72) 



Final Results 

25 bacterial membrane Certainty=0 .5310 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



A related DNA sequence was identified in S. pyogenes <SEQ ID 3143> which encodes the amino acid 
30 sequence <SEQ ID 3 144>. Analysis of this protein sequence reveals the following: 



Possible site: 48 

>» Seems to have an uncleavable N- 
Likelihood =-10.99 
Likelihood = -8.76 
Likelihood = -7.70 
Likelihood = -4.83 
Likelihood = -4.46 
INTEGRAL Likelihood = -4.14 
Likelihood = -0.69 
Likelihood = -0.32 



INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 285 ( 264 - 286) 

- 133 ( 112 - 135) 

- 195 ( 174 - 200) 

- 50 ( 32 - 52) 

- 229 ( 211 - 

- 256 ( 232 - 

- 107 ( 91 - 

- 20 ( 4 - 



■ 230) 

■ 259) 
• 108) 



Final Results 

bacterial membrane 
bacterial outside 
45 bacterial cytoplasm 



--- Certainty=0. 53 94 (Affirmative) < succ: 
--- Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) c suco 



A related sequence was also identified in GAS <SEQ ID 9181> which encodes the amino acid sequence 
<SEQ ID 9182>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-10. 
Likelihood = 
Likelihood ■ 
Likelihood = 
Likelihood ■ 
Likelihood •■ 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 203 



- 275 ( 254 - 

- 123 ( 102 - 

- 185 ( 164 - 

- 40 ( 22 - 

- 219 ( 201 - 

- 246 ( 222 - 
81 - 97 ( 81 - 
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Final Results 

bacterial membrane Certainty=0 . 539 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/287 (69%) , Positives = 244/287 (84%) 

Query: 1 MEGLLIALIPMFAWGSIGFVSNKIGGRPNQQTFGMTLGALLFAIIVWLFKQPEMTASLWI 60 

+EG+ ALIPMF WGSIGFVSNKIGG+P+QQT GMT GALLF++ VWL +PEMT LW+ 
Sbjct: 1 LEGIFYALIPMFTWGSIGFVSNKIGGKPSQQTL3MTFGALLFSLAVWLIVRPEMTLQLWL 60 

Query: 61 FGILGGILWSVGQNGQFQAMKYMGVSVANPLSSGAQLVGGSLVGALVFHEWTKPIQFILG 120 

FGILGG +WS+GQ GQF AM+YMGVSVANPLSS3+QLV GSL+G LVFHEWT+P+QF++G 
Sbjct: 61 FGILGGFIWSIGQTGQFHAMQYMGVSVANPLSSGSQL\^LGSLIGVLVFHEWTRPMQFWG 120 

Query: 121 LTALTLLVIGFYFSSKRDVSEQRLATHQEFSKGFATIAySTVGYISYAVLFNNIMKFDAM 180 

AL LL++GFYFSSK+D + + FSKGF + YST+GY+ YAVLFNNIMKF+ + 

Sbjct: 121 SIALLLLIVGFYFSSKQDDANAQVNHLHNFSKGPl^ALTYSTIGYVMYAVLFNNIMKFEVL 180 

Query: 181 AVILPmVGMCLGAICFMKFRWFFAVWKNMITGLMWGVGNVFMLLAAAKAGLAIAFSF 240 

+VILPMAVGM LGAI FM F+++ + V+KN + GL+WG+GN+FMLLAA+KAGIAIAFSF 
Sbjct: 181 SVILPMAVGMVLGAITFMSFKISIDQYVIICNSWGLLWGIGNIFMLLARSKAGLAIAFSF 24 0 

Query: 241 SQLGVIISIIGGILFLGETKTKKEQKWWMGILCFVMGAILLGIVKS 287 

SQLG IISI+GGILFLGETKTKKE +WW GI+CF+4GAILLG+VKS 
Sbjct: 241 SQLGAIISIVGGILFLGETJCTKKEMRWWTGIICFIVGAILLGWKS 287 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1022 

A DNA sequence (GBSxl092) was identified in S.agalactiae <SEQ ID 3145> which encodes the amino 
acid sequence <SEQ ID 3146>. This protein is predicted to be reef protein (recF). Analysis of this protein 
sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2553 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3147> which encodes the amino acid 
sequence <SEQ ID 3148>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1677 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 248/364 (68%), Positives = 300/364 (82%), Gaps = 1/364 (0%) 

Query: 1 MWIKNISLKHYRNYEEaQVDFSPNLNIFIGRNAQGKTNFLERIYFLALTRSHRTRSDKEL 60 

MWIK + LKHYRNY+ FS IiN+FIG NAQGKTWFLEAIYFL+LTRSHRTR+DKEIa 

Sbjct: 1 ^IKELELKHVRNYDHLIASFSSGI^FIGNNACKSKTNFLEAIYFLSLTRSHRTRADKEL 60 
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Query: 61 VHFKHHDVQITGEVIRKSGHLNLDIQLSEKGRITKVNHLKQAKLSDYIGAMTVVLFAPED 120 

+HF H V +TG++ R SG ++L+I LS+KGR+TK+N LKQAKLSDYIG M WLFAPED 
Sbjct: 61 IHFDHSTVSLTGKIQRISGTTOLEINLSDKGRVTKINALKQAKLSDYIGTMMVVLFAPED 120 

Query: 121 LQLVKGAPSLRRKFLDIDIGQIKPTYIMLS>™HVLKQROTYLlCr™ra)KTFLTVLDE 180 

LQLVKGAPSLRRKF+DID+GQIKP YL+ELS+YNHVLKQRN+YLK+ +D FIi VLDE 
Sbjct: 121 LQLVKGAPSLRRKFIDIDLGQIKPVYLSELSHYNHVIjKQRNSYLKSAQQIDAAFIiAVLDE 180 

Query: 181 QLADYGSRVIEHRFDFIQALNDEADKHHYII3TELEHLSIHYKSSIEFTDKSSIREHFLN 240 

QLA YG+RV+EHR DFI AL EA4 HH IS LE LS+ Y+SS+ F K++I + FL+ 
Sbjct: 181 QLASYGARVMEHRIDFINALEKEANTHHQAISNGLESLSLSYQSSWFDKKTNIYQQFLH 240 

Query: 241 QLSKSHSRDIFKKNTSIGPHRDDITFFII'TOINATFASQGQQRSLILSLKLAEIEIjIKrVT 300 

QL K+H +D F+KNTS+GPHRD++ F+IN +NA FASQGQ RSLILSLK+AE+ L+K +T 
Sbjct: 241 QLElO^QKDFFRKlfrSVGPHRDEIAFYINGMNANFASQGQHRSLILSLKMAEVSLMKALT 3 00 

Query: 301 MDYPILLLDDVMSELDimRQLKLLEG-IKHWQTFITTTSLEHLSALPDQLKIFNVSDGT 359 

D PILLLDDVMSELDN RQ KLLE IKENVQTFITTTSL+HLS LP+ ++IF+V+ GT 
Sbjct: 301 GDNPILLLDDVMSELDOTRQTKLLETVIK3WQTF1TTTSLDHLSQLPEGIR1FHVTKGT 360 

Query: 360 ISIN 363 
+ 1+ 

Sbjct: 361 VQID 364 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1023 

A DNA sequence (GBSxl093) was identified in S.agalactiae <SEQ ID 3149> which encodes the amino 
acid sequence <SEQ ID 3150>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

>>> Seems to have no K- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1807 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA61548 GB:X89367 orfl21 [Lactococcus lactis] 
Identities = 56/116 (48%), Positives = 74/116 (63%), Gaps = 3/116 (2%) 

Query: 3 YKIjFDEYITLQSLLKEIGI IQSGGAIKKFLSDNR- -VLFNGDLENRRGKKLRIiGDIITIP 60 

Y LF+EYITL LLKE+G+I +GG K FIA+N + +NG+ ENRRGKKLR GD++ P 
Sbjct: 4 YILFEEYITLGQLLKEIfiLISTGGQPKIFLAENEGNIFYNGEAENRRGKKLRDGDLLEFP 63 

Query: 61 DQNIEIIIRKPSDQEIEERNIEIAEKQRVSAIVKEMNKNTNKGKSKTSKKPTOFPG 116 

++++ + I+E E AE+ RV AIVK+MN NK K P RFPG 

Sbjct: 64 TFDLKOTFEQADADAIKEHEAEIOffiEARVKAIVKKMNAE-NKTTKPAKKAPPRFPG 118 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3151> which encodes the amino acid 
sequence <SEQ ID 3152>. Analysis of this protein sequence reveals the following: 

> N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 0493 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 74/136 (54%) , Positives = 94/136 (68%) , 
Query: 
Sbj ct : 

Query: 61 DQNI EI I IRKPSDQE I EERNIE IAEKQRVSAI VTCEMNKNTNKGKSK TSKK 110 

DQ++ I I +PS +E E+ E+AEK RV+A+VK+MN+ K SK T+KK 
Sbjct: 69 DQDLIITIVEPSQEEKEQFAEEMAEKTRVAALVKQKNQANKKTSSKHNNRQSTTKKSLRA 128 

Query: 111 PVRFPG 116 

PVRFPG 

Sbjct: 129 TKKTKGKPTAPVRFPG 144 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1024 

A DNA sequence (GBSxl094) was identified in S.agalactiae <SEQ ID 3153> which encodes the amino 
acid sequence <SEQ ID 3154>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.86 Transmembrane 269 - 285 ( 267 - 285) 

Final Results 

bacterial membrane — Certainty=0 . 1744 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3155> which encodes the amino acid 
sequence <SEQ ID 3156>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3008 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/413 (54%) , Positives = 309/413 (73%) 

Query: 1 MKI V^VSLHLIKNQQFKTNHLTFRFSGDFTTOKTVARRSLVAQMLVTANAKYPKVQEFRE 60 

MKTV+GV LHLIK +QFKTNH+TFRFSGD N KTVA++ LVAQML TAN YP V++FRE 
Sbjct: 1 MKIVCGVQLHLIKTKQFRTNHITFRFSGDI^QKTVAKKVLVAQMI^TANECYPTVRQFRE 60 

Query: 61 KIASLYGASLSTKISTKGLVHIVDIDIVFVKNTFTLEQENIVEQIITFLEDMLFSPLISIi 120 

KLA LYGASLST + TKGLVHIVDIDI F+++ + E I++++I FL+D+LFSPL+S+ 
Sbjct: 61 KIjARLYGASLSOT^TKGLTOIVDIDITFICJJRYACNGEKILDEMIQFLKDILFSPLLSI 120 

Query: 121 EQYQTSIFDTEKKNLIQYLEADIEDNFYSSDIALKSLFYNNKTLRLPKYGTASLVESENS 180 

QYQ +F+TEK NLI Y+E+D ED+FY S L +K LFY NK L++ +YG+ L+ E + 
Sbjct: 121 AQYQPKVFETEKNNLINYIESDREDSF-fYSSLKVTffiLFYa<lKNLQMSEYGSPELIAKETA 180 

Query: 181 FTAYQEFQKMLKEDQLDIFWGDFDDYRMIQAFNRmFEPRHKVIAFDYTQTYENITRSQ 240 

+T+YQEF KML EDQ+DIF++GDFDDYR++Q ++ + R+K L F + Q NI + 
Sbjct: 181 YTSYQEFHKMLWEDQIDIFILGDFDDYRWQLIHQFPLDNRNKNLNFFHLQNSVNIIKES 240 

Query: 241 VEDKDVNQSIMQLAYHLPITYKDEDYFALIVFNGLFGAFAHSLLFTEIREKQGLAYTIGS 300 
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Query. 301 QFDSFTGLFTIYAGIDKENRERFLKLINKQFTOIKMGRFSSTLLKQTKDILKMNYVIASD 360 

+FDS+TGLF IY GID ++R + L+LI ++ N IKMGRFS L+K+T+ +L N +L+ D 
Sbjct: 301 RFDSYTGLFEIYTGIDSQHRTKTLQLIIQEimiKMGRFSEQLIKKTRSMLUffilALLSED 360 

Query: 361 KPKVIVDHIYHEHYLDQFHTSALFIDKVDDVTKSDIVEVATKLKLQAFYFLEG 413 
K I++ IY Y+D ++ +1 V++V K+DI+ VA LKLQ YFLEG 

SEQ ID 3154 (GBS400) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 2; MW 49.2kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 3; MW 74kDa) and in Figure 
177 (lane 6; MW 74kDa). 

GBS400-GST was purified as shown in Figure 217, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1025 

A DNA sequence (GBSxl095) was identified in S.agalactiae <SEQ ID 3157> which encodes the amino 
acid sequence <SEQ ID 3158>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm --- Certainty=0.3473 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000(Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3159> which encodes the amino acid 
sequence <SEQ ID 3160>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4298 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/424 (48%) , Positives = 276/424 (64%) , Gaps = 3/424 (0%) 

Query: 5 KITYQNI^EEVYKLTLESGLNVYIiIPKPSFKKTVGVnTANFGSLHTKYTRNGCVEHYPAG 64 

KI Y N+ E++Y + LE+GL VY IK F E +LT FGSL K T + PAG 
Sbjct: 6 KINYPNIDEDLYYVKLENGLTVYFIKKIGFLEKTAMLTVGFGSLDNKLTVDDESRDAPAG 65 

Query: 65 IAHFLEHKLFELDKGQDAATQFTKYGAESNAFTTFDKTSFYFSTISHITNCLDILLDFVL 124 

IAHFLEHKLFE + G D + +FT+ GAE+NAFTTF++TSF+FST S L++L FVL 

Sbjct: 66 IAHFLiEHKLFEDESGGDISLKFTQLGAETNAFTTFNQTSFFFSTASKFQENLELLQYFVL 125 

Query: 125 TTNFTEESITKEKDIIKQEIEMYQDDPEYRLYCGVLSNLYPNSPLRFDIAGDYQSISQIT 184 

+ N T+ES+++EK II QEI+MYQDD +YR Y G+L NL+P + IA DIAG SI +IT 
Sbjct: 126 SANITDESVSREKKIIGQEIDMYQDDADYRAYSGILQNLFPKTSLANDIAGSKASIQKIT 185 
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Sbjct: 186 KILLETHHr^FYQPTNMSLFIVGDIDIDETFlAIQRFQTTLSYPDRKRVTVDPLHYYPVI 245 

Query: 243 Kl^SCHMTVTKPKLRIGYRKSraMIHGSYLICEKIGLQLFFMLLGWTSTINQDWYESGQI 302 
K++S M VT KL +G+R + SI « L+LF +ML+GWTS I YE G+I 
5 Sbjct: 246 KSSSVDMDVTTAKLWGFRGYLTLTQHSLLTYRIALKLFLSMLIGWTSKIYHTLYEDGKI 305 

Query: 303 DDSFDIEIEVHPDFECVIISLDTTEPIAFSTQLRI.riLKNALQSSDLTESHLKNVKRELYG 362 

DDSFD+++E+H +F+ V+ISLDT EPIA S +R h S + T HL +K+E+YG 

Sbjct: 30S DDSFDVDVEIHHNFQFVLISUDTPEPIMISM'IRQKIjATIKISKEFTNEHIiNLIjKKEMYG 365 

10 

Query: 353 DFIiRSLDSIENLMIQFVTYLYDG-KTMYLDLPSIVEELDLEDVITIGKDFLDNADTSDFV 421 

DF++SLDSIE4L QF YL D K Y D+P I+E L L+DV+TIGK F + AD SDF 
Sbjct: 356 DFIQSLDSIEHLTHQFSLYLSDSDKETYFDIPKIIERLTLKDWTIGKRFFEKADASDFT 425 

15 Query: 422 IFPK 425 

+FPK 

Sbjct: 426 VFPK 429 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1026 

A DNA sequence (GBSxl096) was identified in S.agalactiae <SEQ ID 3161> which encodes the amino 
acid sequence <SEQ ID 3162>. This protein is predicted to be phosphotidylglycerophosphate synthase 
(pgsA). Analysis of this protein sequence reveals the following: 

25 Possible site: 55 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.17 Transmembrane 17 - 33 ( 14 - 39) 

INTEGRAL Likelihood = -3.77 Transmembrane 92 - 108 ( 88 - 108) 

INTEGRAL Likelihood = -2.87 Transmembrane 144 - 160 ( 142 - 162) 

30 INTEGRAL Likelihood = -1.65 Transmembrane 42 - 58 ( 42 - 59) 

Final Results 

bacterial membrane — Certainty=0 .4270 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10293> which encodes amino acid sequence <SEQ ID 
10294> was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3163> which 
40 sequence <SEQ ID 3 1 64>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>» Seems to have a cleavable N-tertn signal seq. 

INTEGRAL Likelihood = -6.64 Transmembrane 76 - 92 ( 72 - 102) 
INTEGRAL Likelihood = -5.36 Transmembrane 13S - 152 { 131 - 164) 
45 INTEGRAL Likelihood = -2.34 Transmembrane 98 - 114 ( 97 - 114) 

Final Results 

bacterial membrane Certainty=0. 3654 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/180 (80%) , Positives = 160/180 (88%) 

55 Query: 8 mKXENIPMjLTvWILMIPLFIVLTSVTTSTT^ 67 

M+KKENI PNLLT+VRI MIP F+ +TS + WHI AA++FAIAS TDYLDGYLARKW 
Sbjct: 1 MIKKENIPNLLTLTOIAMIPFFLFITSSSNKVGMHIFAAVIFAIASFTDYLDGYLARKWH 60 
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Query: 68 WTNFGKFADPIADKML VMSAFI.MLVGLDIAPAWVSAI 1 1 CRELAVTGLRLLLVETGGTV 127 

V +NFGKFADPIADKMLVMSAFIMLVGL L PAWVSA+ I 1 CRELAVTGLRLLLVETGG V 
Sbjct: 61 VASNFGKFADPLADKMLVMSAFIMLVGLGLVPAWVSAVIICRELAVTGLRLLLVETGGKV 120 

5 Query. 128 LAAAMPGKIKTATQMFAVIFLLVHWMTI1GNIMI1YIALFFTI1YSGYDYFKGAGFLFKDTFK 187 

LAAAMPGKIKTATQM ++I LL HW+ LGN++LYIALFFT+YSGYDYFKGA FLFKDTFK 
Sbjct: 121 LAAAMPGKIKTATQMLSIILLLCHWIFLGNVLLYIALFFTIYSGYDYFKGASFliFKDTFK 180 

A related GBS gene <SEQ ID 8705> and protein <SEQ ID 8706> were also identified. Analysis of this 
1 0 protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 4 
SRCFLG: 0 

McG: Length Of OR: 9 

Peak Value of UR: 3.03 
15 Net Charge of CR: 1 

McG: Discrim Score: 6.36 

GvH: Signal Score (-7.B): -0.400001 

Possible site: 48 
»> Seems to have a cleavable N-term signal seq. 
20 1 Amino Acid Composition: calculated from 49 

ALOM program count: 2 value: -3.77 threshold: 0.0 

INTEGRAL Likelihood = -3.77 Transmembrane 85 - 101 ( 81 - 101) 
INTEGRAL Likelihood = -2.87 Transmembrane 137 - 153 ( 135 - 155) 
PERIPHERAL Likelihood = 1.27 109 
25 modified ALOM score: 1.25 

icml HYPID: 7 CFP: 0.251 

*** Reasoning Step: 3 



30 Final Results 

bacterial membrane — Certainty=0 .2508 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco , 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1027 

A DNA sequence (GBSxl097) was identified in S.agalactiae <SEQ ID 3165> which encodes the amino 
acid sequence <SEQ ID 3166>. This protein is predicted to be ABC transporter ATP-binding protein 
40 (potA). Analysis of this protein sequence reveals the following: 



3 N- terminal signal £ 



Final Results 

bacterial cytoplasm --- Certainty=0 . 1805 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC61484 GB:AF082738 ABC transporter ATP-binding protein 
[Streptococcus pyogenes] 
Identities = 201/279 (72%), Positives = 231/279 (82%) 

Query: 1 MTNIITVNI^FFKYnSNQTHYQLENVSFHVKC^ 60 

M+ II + + F Y +Q L+ VSFHVKQGEWLSIIGHNGSGKSTT+RLIDGLLE E 
Sbjct: 18 MSAIIELKKVTFNYHICDQEKPTLDGVSFHVKQGEWLSIIGHNGSGKSTTIRLIDGLLEPE 77 



Query: 61 SGQIIIDGQELTEDNVWELRHKIGMVFQNPDNQFVGATVEDDVAFGLENKGIPLKDMKER 120 
SG II+DG LT NVWE+RHKIGMVFQNPONQFVGATVEDDVAFGLENKGI +D+KER 
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Sbjct: 78 SGS I IVDGDLLTITNVWEIRHKIGMVFQNPDNQFVGATVEDDVAFGLENKGIAHEDIKER 137 

Query: 121 VnQALDLVGMSEFKMREPARLSGGQKQRVAIAGAVAMRPQVIILDEATSMLDPEGRLELI 180 
V+ AL+LVGM FK +EPARLSGGQKQRVAIAGAVAM+P++IILDEATSMLDP+GRLELI 
5 Sbjct: 138 VMHALELVGMQNFKEKEPARLSGGQKQRVAIAGAVAMKPKIIILDEATSMLDPKGRLELI 197 



Query: 181 RTIRAIRQKYNLTVISITHDLDEVALSDRVIVMKNGICVESTSTPKRLFGRGNRLISLGLD 240 

+TI + IR Y LTVI S I THDLDEVALSDRV+VMK+G+ VESTSTP+ LF RG+ L+ LGLD 
Sbjct: 198 KTIKNIRDDYQLTVISITHDLDEVALSDRVLVMKDGQVESTSTPEQLFARGDELLQLGLD 257 

10 

Query: 241 VPFTSRLMAELA2WGLDIGTEYLTEKELE3QLWELNLKM 279 

+PFT+ ++ L G I YLTEKELE Qlt +L KM 
Sbjct: 258 IPFTTSWQMLQEEGYPIDYGYLTEKELENQLCQLISKM 296 



15 A related DNA sequence was identified in S. pyogenes <SEQ ID 3167> which encodes the amino acid 
sequence <SEQ ID 3168>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm --- Certainty=0 .2235 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 200/279 (71%) , Positives = 231/279 (82! 



5 II+DG LT NVWE+RHKIGMVBQilPLf'i 1 ^ DVAFGLENKGI +D+KER 





1 


Sbjct: 


18 


Query: 


61 


Sbjct: 


78 


Query: 


121 


Sbjct: 


130 




181 


Sbjct: 


198 




241 


Sbjct: 


258 



V+ AL+LVGM FK +EPARLSGGQKQRVAIAGAVAM+P++IILDEATSMLDP+GRLELI 



+TI+ IR Y LTVTSITHDLDEVALSDRV+VMK+G+VESTSTP+ LF RG+ L+ LGLD 



+PFT+ ++ L G + YLTEKELE QL +L KM 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1028 

A DNA sequence (GBSxl098) was identified in S.agalactiae <SEQ ID 3169> which encodes the amino 
acid sequence <SEQ ID 3170>. Analysis of this protein sequence reveals the following: 

55 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 154 - 170 ( 154 - 170) 



60 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11922 GB:Z99104 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 141/242 (58%) , Positives = 188/242 (77%) , Gaps = 1/242 (0%) 



Query: 


16 


Sbjct: 


3 




76 


Sbjct: 






13 5 


Sbj Ct : 


123 




196 


Sbjct: 


183 


Query: 


255 


Sbjct: 


243 



E+L D++PFELSGGQMRRVMAG+J J M>1+P+VLVLDEPTAGLDP+GRKE+M +F LH++G 



+T +LVTH M+D A YAD + V+ G + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3171> which encodes the amino acid 
sequence <SEQ ID 3172>. Analysis of this protein sequence reveals the following: 
Possible site: 40 



• Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < j 



The protein has homology with the following sequences in the databases: 

>GP:CAB11922 GB:299104 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 146/259 (56%) , Positives = 187/259 (71%) , Gaps = 2/25S (0%) 



KNK++K +RK VG+VFQFPE QLFEETVLKD++FGP NFGV E+AE ARE L I 



E h + + + PFELSGGQMRRVAIAG+LAM P+VLVLDEPTAGLDP+GRKE+M +F +LHQ G 



■+T +LVTH M+D A YAD -t 



Query: 


16 


Sbjct: 


3 


Query: 


76 


Sbjct: 


63 


Query: 


136 


Sbjct: 


123 


Query: 


196 


Sbjct: 


183 


Query: 


255 


Sbjct: 


243 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/280 (77%) , Positives = 241/280 (85%) 



Query: 
Sbjct: 



MGIEFKOTSYTYQAGTPFEGRALFDWLKISDASYTAFIGHTGSGKSTIMQLLNGLHIPT 60 
M I +NVSYTYQAGTPFEGRALF++NL I D SYTAFIGHTGSGKSTIMQLLNGLH+PT 
MSINLQNVSYTYQAGTPFEGRALFNINLDILD3SYTAFIGHTGSGKSTIMQLIiNGLHVPT 6 0 



Query: 61 KGEVIVDDFSIKAGDKNKEIKFIRQKVGLVFQFPESQLFEETVLKDVAFGPQNFGISQIE 120 

G V VD I KNKEIK IR+ VGLVFQFPES QLFEETVLKDVAFGPQNFG+ S E 
Sbjct: 61 TGIVSVDKQDITNHSKNKEIKSIRiCHVGliVFQFPESQLFEETVLKDVAFGPQNFGVSPEE 12 0 

Query: 121 AERLAEEIOjRLVGISEDLFDKNPFELSGGQMRRVAIAGILAMEPKVLVLDEPTAGLDPKG 180 

AE LA EKL LVGISE+LF+KNPFELSGGQMRRVAIAGIIAM+PKVLVLDEPTAGLDPKG 
Sbjct: 121 AEALAREKLALVGISEWLFEKNPFELSGGQMRRVAIAGILAMQPKVLVIjDEPTAGLDPKG 180 

Query: 181 RKELMTLFKNLHKKGMTIVLVTHLMDDVADYADYVYVLEAGKVTLSGQPKQIFQEVELLE 240 

RKELMT+FK LH+ GMTIVLVTHLMDDVA+YAD+VYVL+ GK+ LSG+PK IFQ+V LLE 
Sbjct: 181 RKELMTIFKKLHQSGMTIVLVTHLMDDVANYADFVYVLDKGKIILSGKPKTIFQQVSLLE 240 



Sbjct: 241 KKQLGVPKVTKLAQRLVDRGI P I S S L P I TLEELRE VLKHG 280 

SEQ ID 3170 (GBS401) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 3; MW 34.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 4; MW 59kDa). 

GBS401-GST was purified as shown in Figure 218, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1029 

A DNA sequence (GBSxl099) was identified in S.agalactiae <SEQ ID 3173> which encodes the amino 
acid sequence <SEQ ID 3174>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 




• Certainty=0. 5182 (Affirmative) < succ: 

• Certainty=0. 0000 (Not Clear) < suco 
- Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8707> which encodes amino acid sequence <SEQ ID 8708> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
SRCFLG : 0 

McG: Length of OR: 8 

Peak Value of UR: 0.65 

Net Charge of CR: 1 
McG: Discrim Score: -10.55 
GvH: Signal Score (-7.5): 1.45 



WO 02/34771 



PCT/GB01/04789 



Possible site: 
>>> Seems to have m 
Amino Acid Composit. 
ALOM program 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 



N-terminal signal sequence 
on: calculated from 1 
: 6 value: -10.46 threshold: 



Likelihood =-] 
Likelihood = - 
Likelihood - • 
Likelihood = - 
Likelihood = - 
Likelihood = ■ 



PERIPHERAL Likelihood = 
modified ALOM score: 2.59 
icml HYPID: 7 CFP: 0.518 

iP : 3 



■ Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Transmembrane 41 - 57 ( 19 - 

Transmembrane 246 - 262 ( 243 - 263] 

Transmembrane 110 - 126 ( 104 - 

Transmembrane 23 - 39 ( 19 - 

Transmembrane 71- 87 ( 71 - 

Transmembrane 193 - 209 ( 193 - 209] 
90 



— Certainty=0. 5182 (Affirmative) . 

— Certainty=0. 0000 (Not Clear) < i 
■ — Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11923 GB.-Z99104 ybaF [Bacillus subtilis] 
Identities = 133/263 (50%) , Positives = 191/263 (72%) 

Query: 7 MDKLILGRYIPGNSLIHKLDPRSKLLAMLLFIIIVFWANNVVTNVIVFIFTLVIVGLSQI 66 

MD +I+G+Y+PG SL+H+LDPR+KL+ + LF+ IVF ANNV T ++ +FT+ +V L+++ 
Sbjct: 2 MDSMIIGKXVPGTSLVHRLDPRTKLITIFLFVCIVFLANNVQTYALLGLFTIGWSLTRV 61 

Query: 67 KFSYFFNGIKPMVGIILFTTLFQMLFAQGGQVIFSFWIFSITSLGLQQAALIFMRFVLII 126 

FS+ G+KP++ I+LFT L +L G +IF + GL Q I +RFV +1 

Sbjct: 62 PFSFLMKGLKP1IWIVLFTFLLHILMTHEGPIIFQIGFSRVYEGGLVQGIFISLRFVYLI 121 

Query: 127 FFSTLLTLTTTPLSLADAVESLDKPLEVLRVPAHElGLMLSLSLRPVPTLMDDTiRIMNA 186 

+TLLTLTTTP+ + D +E LIi PL+ L++P HE+ LM+S+SLRF+PTLM++T +IM A 
Sbjct: 122 LITTLLTLTTTPIEITDGMEQLLNPLKKLKLPVHELALMMSISLRFIPTLMEETDKIMKA 181 

Query: 187 QRARGVDFGEGNLIHKVKSIIPILIPLFASSFKRADALAIAMEARGYQGGANRSKYRLLK 246 

Q ARGVDF G + +VK+I+P+L+PLF S + FKRA+ LA+AMEARGYQGG R+KYR h 
Sbjct: 182 QMARGVDFTSGPVKERVKAIVPLLVPLFVSAFKRAEELAVAMEARGYQGGEGRTICYRKLV 241 

Query: 247 WTVRDTFSILLMLLLGLSLFDLK 269 

WT +DT 1+ +++L LF L+ 
Sbjct: 242 WTGKDTSVIVSLIVLAALLFSLR 264 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3175> which encodes the amino acid 
sequence <SEQ ID 3176>. Analysis of this protein sequence reveals the following: 



Possible site: 53 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood = -9.50 Transmembrane 


246 


262 


243 


265 


INTEGRAL 


Likelihood = -9.34 Transmembrane 






103 


135 


INTEGRAL 


Likelihood = -6.69 Transmembrane 


41 


57 


40 


58 


INTEGRAL 


Likelihood = -2.81 Transmembrane 


23 


39 


21 


40 


INTEGRAL 


Likelihood = -1.01 Transmembrane 


62 


78 


62 


78 


INTEGRAL 


Likelihood = -0.27 Transmembrane 


193 


209 


193 


20S 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Certainty=0 . 4800 (Affirmative] 
-- Certainty=0. 0000 (Not Clear) • 
-- Certainty=0. 0000 (Not Clear) . 



The protein has homology with the following sequences in the databases: 

>GP:CAB11923 GB:S99104 ybaF [Bacillus subtilis] 
Identities = 138/263 (52%) , Positives = 195/263 (73%) 
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Query: 


1 


Sbjct: 


2 


Query: 


61 


Sbjct: 


62 


Query: 


121 


Sbjct: 


122 


Query: 


181 


Sbjct: 






241 


Sbjct: 


242 



MDKLILGRYIPGDSLIHRLDPRSKLIAMIIYIVIIFWANNVVTNLLMLTFTIjAvVFLSKI 60 
MD +I+G+Y+PG SL+HRLDPR+KL+ + +++ I+F ANNV T h+ FT+ W L+++ 
MDSMIIGKOTPGTSLVHRLDPRTKLITIFLWCIVFLANMVQTYAr.LGLFTIGVVSLrRV 61 



SF + G+KP+I I+LFT L + + G+IF F+ + GLQII +RFV +1 



+TLLTLTTTP+ ++D +E LL PL + K+P HE+ LM+S+SLRF+PTLM++T +IM A 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/263 (79%) , Positives = 237/263 (89%) 

Query: 7 roKLILGRYIPGNSLIHKLDPRSKLLAMLLFIIIVFWAHNVVTOVIVFIFTLVIVGLSQI 66 

MDKLILGRYIPG+SLIH+LDPRSKIirJW+++r- i -I+FWANNVVTN+++ FTL +V LS+I 
Sbjct: 1 MDKLILGRYIPGDSLIHRLDPRSKLIAMIIYIVIIB'WAMWTNLLMLTFTLAWFLSKI 60 

Query: 67 KFSYFFNGIKPMVGIILFTTLFQMLFAQGGQVIPSFWIFSITSLGLQQAALIFMRFVLII 126 

K S+F NG+KPM+GIILFTTIjFQM F+QGG+VIFS+W SIT LGL QA LIFMRFVLII 
Sbjct: 61 KLSFFLNGVKPMIGIILFTTLFQMFFSQGGKVIFSWWFISITDLGLSQAILIFMRFVLII 120 

Query: 127 FFSTLLTLTTTPLSLADAVESLLKPLEVIJIVPAHEIGLMIjSIiSLRFVPTLMDDTTRIMNA 186 
FFSTLLTLTTTPLSL+DAVESLLKPL +VPAHEJGLMLSLSLRFVPTBMDDTTRIMNA 
■ Sbjct: 121 FFSTLLTLTTTPLSLSDAVESLLKPLTRFKVPAHEIGLMLSLSLRFVPTLMDDTTRIMNA 180 

Query: 187 QRARGVDFGEGI^IHKVKSIIPILIPLFASSFKRADALAIAMEARGYQGGAmSKYRIjLK 246 

QRARGVDFGEGNLI KVKSIIPILIPLFASSFKKADALAIAMEARGYQGG R+KYR L 
Sbjct: 181 QRARGVDFGEGNLIQKVKSIIPILIPLFASSFKRADAIAIAMEARGYQGGEGRTKYRQLD 240 

Query: 247 WTVRDTFSILLMLLLGLSLFLLK 269 

W +4-D+ +1 ++ LLGL IjF LK 
Sbjct: 241 WQLKDSLAIGIVSLLGLLLFFLK 263 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1030 

A DNA sequence (GBSxllOl) was identified in S.agalactiae <SEQ ID 3179> which encodes the amino 
acid sequence <SEQ ID 3180>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 45 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 22 - 38 ( 16 - 43) 

Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < succ> 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 3181> which encodes the amino acid 
sequence <SEQ ID 3182>. Analysis of this protein sequence reveals the following: 

PoSGible site: 21 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/233 (49%) , Positives = 140/233 (59%) , Gaps = 39/233 (15%) 

Query: 9 mTOOfflHLAYGAITLVALFSCIlAVMVIFKSSQVTTESLSKADKVRVAKKSK 61 

K N+K+ + 4G LVAL ILA++ F S T+S +K + +4 K 
Sbjct: 4 KENLKQRYFNFG LVALALTILAI I FAFSSKNADTKSYAKKSESKMVTIDKAPKNNHA 60 

Query: 62 MTKATSKSKVEDVKQAPKPSQASMEAPKSSSQSTEAMSQQQVTASEEAAVEQAWTKNTP 121 

+TK SK K + + P P+ ++ AP T +EE V Q VT 
Sbjct: 61 ITKEESKEKAKSIASEPIPTVENSVAP TVTEEVPWQQEVT 101 

Query: 122 ATSQAQQAYAVTETTYRPAQHQTSGQVLSNGKTAGAIGSAAAAQMAAATGVPQSTWEHII 131 

Q V+ Y P + VLSNGNTAG +GS AAAQMAAATGVPQSTWEHI I 

Sbjct: 102 QTVQQVSSVAYNP NNWLSNGNTAGIVGSQAAAQMAAATGVPQSTWEHII 151 

Query: 182 ARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKRYRAQGLSAWGY 234 

ARESNGNPN AHRSGASGLFQTMPGWGSTATV+DQVN+A+KRY AQGLSAWGY 
Sbjct: 152 ARESNGNPNAAHASGASGI.FQTMPGWGSTATVEDQVNAALKAYSAQGLSAWGY 204 

A related GBS gene <SEQ ID 8713> and protein <SEQ ID 8714> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 2.48 
GvH: Signal Score (-7.5): -3.74 

Possible site: 45 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -12.05 threshold: 0.0 

INTEGRAL Likelihood =-12.05 Transmembrane 22 - 38 ( 16 - 43) 
PERIPHERAL Likelihood =4.29 156 
modified ALOM score : 2.91 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

61.8/68.7% over 114aa 

Staphylococcus aureus 

GP | 7959131 | secretory protein SAI-B Insert characterized 
ORF01057(664 - 1002 of 1302) 

GP|795913lldbj|BAA95959.l| | AB042839 (119 - 233 of 233) secretory protein SAI-B 
{Staphylococcus aureus} 
%Match =15.1 

%Identity =61.7 %Similarity =68.7 

Matches = 71 Mismatches = 34 Conservative Sub.s = 8 

438 468 498 528 558 588 618 648 
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IFKSSQVTTESLSKADKAmVAKICSKMTKATSKSKVED^^ 

VDQAHLVDLAHNHQDQLNAAPIKDGAYDIH 

50 60 70 80 90 100 110 

5 

678 708 735 762 792 822 852 882 

TEm'PATSCAQQAyAOTETTYRP-AQHQTSGQV-LSNGNTAGAIGSAAARQMAAATGVPQSTWEHIIARESNGNPNVANA 

: : II 11= II = II I lllllllll 11 = 11 II llll III HIIIIM I I 

SVSTOAQSSNSl^VEAVSAPTYHNYSTSTTSSSVHLSNGI^AGATGSEAAQI^QRTGVPASTOAailARESNGQWATMP 
10 130 140 150 160 170 180 190 

912 942 972 1002 1032 1062 1092 1122 

SGASGLFQTMPGWGSTAOTQDQVNSAIKAYRAQGLSAWGY**IAIN^^ 
llllllllllllll I II bbhllbllll lib 
1 5 SGASGLFQTMPGWGPTNTVDQQINAAVKAYKAQGLGAWGF 

SEQ ID 3180 (GBS25) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 5; MW 25kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 11; MW 50kDa), Figure 63 
20 (lane 6; MW 50.3kDa), Figure 66 (lane 6; MW 50kDa) and in Figure 175 (lane 8 & 9; MW 50kDa). 

Purified GBS25-GST is shown in Figure 9A, Figure 193 (lane 1 1) and Figure 210 (lane 5). 

The purified GBS25-GST fusion product was used to immunise mice (lane 1+2+3 products; 20ug/mouse). 
The resulting antiserum was used for Western blot (Figure 95B), FACS (Figure 95C ), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
25 bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1031 

A DNA sequence (GBSxll03) was identified in S.agalactiae <SEQ ID 3183> which encodes the amino 
30 acid sequence <SEQ ID 3184>. This protein is predicted to be L-serine dehydratase 1 (sdaA-2). Analysis of 
this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.85 Transmembrane 205 - 221 ( 205 - 221) 
35 INTEGRAL Likelihood = -0.59 Transmembrane 171 - 187 ( 171 - 187) 

INTEGRAL Likelihood = -0.53 Transmembrane 226 - 242 ( 226 - 242) 

Final Results 

bacterial membrane Certainty=0 . 1341 (Affirmative) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13459 GB:Z99112 similar to L-serine dehydratase [Bacillus subtilis] 
45 Identities = 176/289 (60tr), Positives = 224/289 (76%), Gaps = 1/289 (0%) 

Query: 1 MFYTIEELVEOANSQHKGNIAELMIQTEIEMTGRSRSEIRYIMSRNLEVMKASVIDGLTP 60 

MF ++EL+E + + I+++MI E+E+T +++E+I M NL VM+A+V GL 
Sbjct: 1 MFT^KELIE-ITKEKQILISDVMIA2EKE\^ 59 

50 

Query: 61 SKSISGLTGGDAVKMDQYLQSGKTISDTTIIiAAVRNAMAVWEIjNAKMGLVC^^ 120 

S +GLTGGDAVK+ Y++SGK++S IL AV A+A NE+NA MG +CATPTAGSAG 
Sbjct: 60 VTSQTGLTGGDAVKLQAYIRSGKSLSGPLILDAVSKAVATNEVNAAMGTICATPTAGSAG 119 
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Query: 121 CLPAVISTAIEKLNLTEEEQLDFLFTAGAPGLVIGNNASISQAEGGCQAEVGSASAMAAA 180 

+P + EKLN T E+ + FLFTAGAFG V+ NNASISGA GGCQAEVGSAS MA&A 
Sbjct: 120 WPGTLFAVKEKI^PTREQMIRFLFTAGAFGFVVANiNASISGAAGGCQAEVGSASGMAAA 179 

5 Query: 181 ALVMAAGGTPFQASQAIAFVIKNMLGLICDPV&GLVEVPCOT 240 

A+V AGGTP Q+++A+A +KI<MLGIi+CDPVAGLVEVPCVKRNa+G+S A++AADMALA 
Sbjct: 180 AIVEMAGGTPEQSAEAMAITLKNIVMLVCIIPVA^ 239 

Query: 241 GIESQIPVDEVIDAMYQVGSSLPTAFRETAEGGLAATPTGRRySKEIFG 289 
10 GI S+IP DEVIDAMY++G ++PTA RET +GGIAATPTGR K+IFG 

Sbjct: 240 GITSRIPCDEVIDAMYKIGQTMPTALRETGQGGLAATPTGRELEKKIFG 288 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3185> which encodes the amino acid 
sequence <SEQ ID 3186>. Analysis of this protein sequence reveals the following: 

15 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.12 Transmembrane 196 - 212 ( 196 - 213) 
INTEGRAL Likelihood = -0.27 Transmembrane 226 - 242 ( 226 - 242) 

20 Final Results 

bacterial membrane --- Certainty=0. 1447 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology widi the following sequences in the databases: 

>GP:CAB13459 GB:Z99112 similar to L-serine dehydratase [Bacillus subtilis] 
Identities = 173/289 (59%) , Positives = 222/289 (75%) , Gaps = 1/289 (0%) 

MFYTIEELVKQADQQFNGNIAELMIATEVEMSGRNREDIIKIMSRNLQVMKAAVTEGLTS 60 
MF ++EL++ ++ I++4MIA E+E++ + +EDI + M NL VM+AAV +GL 
MFPJIvTCELIEITKEK-QILISDWIJiQEMEVTEKTKEDIFQQMDHNLSVMEAa.VQKGLEG 59 





1 


Sbjct: 


1 


Query: 




Sbjct: 


60 


Query: 


121 


Sbjct: 


120 




181 


Sbjct: 


180 




241 


Sbjct: 


240 



h FLFTAGAFG V+ NNASISGA GGCQAEVGSA+ M+AA 



Q+++A+A +KN+LGLVCDPVAGLVEVPCVICRNA+GAS A++AADMALA 



I S+IP DEVIDAMY++G MPTA RET +GGLAATPTGR 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/290 (84%) , Positives = 273/290 (94%) 

Query: 1 MFYTIEELVEQANSQHKGNIAELMIQTEIEMTGRSREEIRYIMSRNLEVMKASVIDGLTP 60 

MFYTIEELV+QA+ Q GNIAELMI TE+EM+GR+RE+I IMSRNL+VMKA+V +GLT 
Sbjct: 1 MFYTIEELVKQADQQFNGNIAELMIATEVEMSGRNREDI I KIMSRNLQVMKAAVTEGLTS 60 

Query: 61 SKSISGLTGGDAVICMDQYLQSGKTISDTTIIAAVRNAMAVNELNAKMGLVCATPTAGSAG 120 

+KSISGLTGGDAVKMD Y++ G ++SDTTIL AVRNA+AvNELNAKMGLVCATPTAGSAG 
Sbjct: 61 TKS I SGLTGGDAvTCMDNYI KKGNSLSDTTI IjNAVRNAI AVNEUIAKMGLVCATPTAGSAG 120 

Query: 121 CLPAVISTAIEKIJILTEEEQLDFLFTAGAFGLVIGNNASISGAEGGCQAEVGSASAMAaA 180 

CLPAV++TAIEKL+L+E+EQL+FLFTAGAFGLVIGNNASISGAEGGCQAEVGSA+AM+AA 
Sbjct: 121 CLPAVLATAIEKLDLSEKEQLEFLFTAGAFGLVIGNNASISGAEGGCQAEVGSAAAMSAA 180 

Query: 181 ALVT^GGTPFQASQAIAFVlKNMLGLICDPVAGLv^^ 240 
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ALV AAGGT QASQAIAFVIKN+LGL+CDPVAGLTOVPC^TKRNALG+SFALVAaDMRLA 
Sbjct: 181 ALVKAAGGTSHQASQAIAFVIK^LGIjVCDPVAGLVEVPCVKRNALGASFALVAADMALA 240 

Query: 241 GIESQIPVDEVIDAMYQVGSSLPTAFR3TAEGGLAATPTGRRYSKEIFGE 290 
5 I+SQIPVDEVIDAMYQVGS++PTAFRETAEGGLAATPTGRRYS EIFGE 

Sbjct: 241 DIDSQIPVDEVIDAMYQVGSAMPTAFRETAEGGLAATPTGRRYSVEIFGE 290 

SEQ ID 3184 (GBS358) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 176 (lane 6; MW 35kDa). 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1032 

A DNA sequence (GBSxll04) was identified in S.agalactiae <SEQ ID 3187> which encodes the amino 
acid sequence <SEQ ID 3188>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

20 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06216 GB:AP001515 L-serine dehydratase beta subunit [Bacillus halodurans] 
25 Identities = 101/216 (46%), Positives - 156/216 (71%), Gaps = 2/216 (0%) 

Query: 4 LKFQSVFDIIGPVMIGPSSSHTAGATOIGKvVHSIFGE-PSEVTFHLYNSFAKTYQGHGT 62 

+K+++VFDIIGPVMIGPSSSHTAGA RIG+V ++FG+ P + Y SFA+TY+GHGT 

Sbjct: 1 MKYRTVFDIIGPVMIGPSSSHTAGAARIGRVARTLFGQQPSRCDIYFYGSFAETYKGHGT 60 

30 

Query: 63 DKALVAGILGMDTDNPDIKNSLEIAHQKGIKIYVnDILHDSNSPHPOTAKITVKNGDRSMS 122 

D A+V GIL DT +P I SL++A +KG+++Y+ +++ + HPNTAK+ ++ G+ + 
Sbjct: 61 DVAIVGGILDFDTFDPRIPRSLQLAKEKGVRVYFHE-EEAITDHPNTAKVVLQKGEDQLE 119 

35 Query: 123 ITGVSIGGGNIQVTELNGFSVSLTMNTPTLIIVHQDIPGMIAKVTDILSDFNINIAQMNV 182 

+ GVS1GGG I++ ELNGF + L+ N P +++VH D G+IA V+++L+ INI M V 
Sbjct: 120 WGVS1GGGKIEIVELNGFHLKLSGNHPAILWHTDRFGVIASVSNMLAKHEINIGHMEV 179 

Query: 183 TRESAGEKAIMI IEVDSRDCQQAVKKIEAIPHLHNV 218 
40 +R+ G++A+M+IEVD ++++E +P++ V 

Sbjct: 180 SRKEKGKEALMVIEVDQNVDDLLLQELERLPNIVTV 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3189> which encodes the amino acid 
sequence <SEQ ID 3190>. Analysis of this protein sequence reveals the following: 

45 Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9161> which encodes the amino acid sequence 

<SEQ ID 9162>. Analysis of this protein sequence reveals the following: 

55 Possible Site: 28 

=.» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty= 0 .300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 187/223 (83%) , Positives = 205/223 (91%) , Gaps = 1/223 (0%) 





1 


MKHLKFQSVFDIIGPVMIGPSSSHTAGAWIGKWHSIFGE-PSEVTFHLYNSFAKTYQG 


59 






M KFQSVFDI IGPVMIGPSSSHTAGAVRIGKWHSIFG+ P EVTFHLYNS FAKTY+G 




Sbjct: 


3 


MNTQKFQSVFDIIGPVMIGPSSSHTAGAVRIGKWHSIFGDIPDFjVTFHLYNSFAKTYRG 


62 


Query: 


60 


HGTDICALVAG1LGMDTDNPDIKNSLEIAHQKGIKIYWDILKDSNSPHPNTAKITVKNGDR 119 






HGTDKALVAGI+GM TDNPDIKNSLEIAHQXGIKIYWDILKDSN+PHPNT KI+VK D+ 




Sbjct: 


63 


HGTDKALVAGIMGMGTDNPDIKNSLEIAHQKGIKIYVmiLKDSNAPHPNTVKISVKKADK 






120 


SMSITGVSIGGGNIQvTELNGFSVSLTMNTPTLIIVHQDIPGMIAKVTDILSDFNINIAQ 


179 






+ +S +TGVS IGGGNI QVTELNGFS VSL+MNTPT++ VH+DIPGMIAKVTDILS NINIA 




Sbjct: 


123 


TLSVTGVSIGGGNIQVTELNGFSVSLSMNTPTIVTVHKDIPGMIAKVTDILSSNNINIAT 


182 


Query: 


180 


MNVTRESAGEKAIMI IEVDSRDCQQAVKKIEAI PKLHNVNFFD 222 








MNVTRESAGEKA MIIEVDSR+CQ+A +1 IPH++NVNFFD 




Sb j ct : 


183 


MNVTRESAGEKATMI IEVDSRECQEAANQIAKIPHIYNVNFFD 225 





SEQ ID 3188 (GBS151) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 31 (lane 3; MW 50kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 188 (lane 11; MW 25kDa) and in Figure 165 
(lane 14-16; MW25.3kDa). 

30 The GBS151-GST fusion product was purified (Figure 198, lane 3; Figure 236, lane 8) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 289), which confirmed that the protein 
is irnmunoaccessible on GBS bacteria. 

GBS151L was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 127 (lane 8-10; MW 50kDa). GBS151L was also expressed in E.coli as a His-fusion 
35 product. SDS-PAGE analysis of total cell extract is shown in Figure 127 (lane 11 & 12; MW 25kDa), in 
Figure 128 (lane 7; MW 25kDa) and in Figure 180 (lane 7; MW 25kDa). Purified GBS151L-His is shown 
in Figure 232 (lanes 5 & 6) and in Figure 240 (lanes 3 & 4). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1033 

A DNA sequence (GBSxll05) was identified in S.agalactiae <SEQ ID 3191> which encodes the amino 
acid sequence <SEQ ID 3192>. This protein is predicted to be tRNA (5-methylaminomethyl-2- 
thiouridylate)-methyltransferase (trmU). Analysis of this protein sequence reveals the following: 

Possible site: 47 
45 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2208 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 1029l> which encodes amino acid sequence <SEQ ID 
10292> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04980 GB:AP001511 

(5-methylaminomethyl-2-thiouridylate) -methyltran sferase 
[Bacillus halodurans] 
Identities = 250/359 (69%), Positives = 292/359 (80%), Gaps = 6/359 (1%) 

RVWGMSGGVDSSVTALLLKEQGWIGVHMKNWDDTDEFGVCTATEDYKDVAAVADQIG 9 1 
RWVGMSGGVDSSVTALLLKEQGYDVIG+FMKNWDDTDE GVCTATEDY+DV V +Q+G 



I YY+VNFEKEYWD+VF YFL EY+AGRTPNPDVMCNKEIKFKAFL++A+TLGADYVATG 



Query: 


32 


Sbjct: 


10 




92 


Sbjct: 


70 


Query: 


152 


Sbjct: 


130 


Query: 


212 


Sbjct: 






272 


Sbjct: 


249 


Query: 


332 


Sbjct: 


305 



- +G ++RG D NKDQTYFL+ LSQ+QL + +FPLGHL+* 



3IHFTRDMPNEFKLEC 331 
GG +PWFV+GK+L KNI LYVGQGF+H L S LA ++++ ++ EC 



TAKFRYRQPD KVTVY + + A V+F + QRAITPGQAWFY+ CLGGG ID + 
TAKFRYRQPDQKVTWPQSIXaVEVLFAEPQRAITPGQAVVFYDGDVCLGGGTIDHVLK 363 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3193> which encodes the amino acid 
sequence <SEQ ID 3194>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm --- Certainty=0 . 1691 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < succ> 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



RGD motif: 331-3 



The protein has homology with the following sequences in tl 



>GP:BAB04980 GB:AP001511 

(5-methylamincmethyl-2-thiouridylate) -methyltran sferase 
[Bacillus halodurans] 
Identities = 255/359 .(71%) , Positives = 293/359 (81%) , Gaps = 6/359 (1%) 

Query: 14 RVWGMSGGVDSSVTALLLKEQGYDVIGVFMiamDDTDEFGVCTATEDYKDVAAVADKIG 73 

RVWGMSGGVDSSVTALLLKEQGYDVIG+FMKNWDDTDE GVCTATEDY+DV V +++G 
Sbjct: 10 RVWGMSGGVDSSVTALLLKEQGYDVIGIFMKNWDDTDENGVCTATEDYQDWQVCNQLG 69 

Query: 74 IPYYSVNFEKEYWDRVFEYFLAEYRAGRTPNPDVMCNKEIKFKAFLDYAMTLGADYVATG 133 

I YY+VNFEKEYWD+VF YFL EY+AGRTPNPDVMCNKEIKFKAFL++A+TLGADYVATG 
Sbjct: 70 IAYYAWFEKEYWDKVFTYFLEEYKAGRTPNPDVMCNKEIKFKAFLNHALTLGADYVATG 129 

Query: 134 HYAQVKRDENGTVHMLRGADNGKDQTYFLSQLSQEQLQKTLFPLGHLQKSEVREIAERAG 193 

HYAQVK + +G ++RG D KDQTYFL+ LSQ+QL + +FPLGHL+K EVR IAERAG 
Sbjct: 130 HYAQVK-NVDGQYQLIRGKDPNKDCTYFIJmLSQ^LSRvW^ 188 
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Query: 194 LATAKKKDSTGICFIGEKNFKQFLSQYLPflQKGRMIWIDGRDKGEHAGLMYYTIGQRGGL 253 

IATAKKKDSTGICFIG+++FK+FLS YLPAQ G M T+DG G H GLMYYT+GQR GL 
Sbjct: 189 IATAKKKDSTGICFIGKRDFKEFLSSYLPAQPGEMQTLDGEVKGTHDGLMYYTLGQRQGL 248 

Query: 254 GIGGQHGGDNQPWFWGKDLSQNILYVGQGFYHEALMSNSLDflSVIHFTREMPEEFTFEC 313 

GI GG +PWFV+GK+L +NILYVGQGF+H L S L A +++ + FEC 

Sbjct: 249 GI GGSGEPWFVIGKNLEKNILYVGQGFHHPGLYSEGLRAIKVNWILRRESDEPFEC 304 

Query: 314 TAKFRYRQPDSHVAVHVRGDKA-EWFAEPQRAITPGQAWFYDGKECLGGGMIDMAYK 371 

TAKFRYRQPD V V+ + D A EV+FAEPQRAITPGQAWFYDG clggg id k 
Sbjct: 305 TAKFRYRQPDQKOTVYPQSDGAVEVLFAEPQRAITPGQAWFYDGDVCLGGGTIDHVLK 363 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 332/377 (88%) , Positives = 349/377 (92%) 

Query: 21 GRILMTDWSWIRWVGNKGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 80 

G MTDNS IRWVGMSGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 
Sbjct: 3 GEFFMTDNSKIRVWGMSGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 62 

Query: 81 KDVAAVADQIGIPYYSVNFEKEYI'JDRVFEYFLAEYRAGRTPNPDVMCNKEIKFKAFLDYA 140 

KDVAAVAD+IGIPYYSVNFEKEYWDRVFEYFIAEYRAGRTPNPDVMCNKEIKFKAFLDYA 
Sbjct: 63 KDVAAVADKIGIPYYSVNFEKEYWDRVFEYFIAEYRAGRTPNPDVMCNKEIKFKAFLDYA 122 

Query: 141 MTLGADYVATGHYAQVTRDENGIVHMLRGADNNKDQTYFLSQLSQEQLQKTLFPLGHLQK 200 

MTLGADYVATGHYAQV RDENG VHMLRGADN KDQTYFLSQLSQEQLQKTLFPLGHLQK 
Sbjct: 123 MTLGADYVATGHYAQVKRDENGTVHMLRGADKGKDQTYFLSQLSQEQLQKTLFPLGHLQK 182 

Query: 201 PEVRRIAEEAGIATAKKKDSTGICFIGEKI^iFKDFLGQYLPAQPGR^lMTVDGRDMGEHAGL 26,0 

EVR IAE AGLATAKKKDSTGICFIGEKNFK FL QYLPAQ GRMMT+DGRDMGEHAGL 
Sbjct: 183 SEVREIAERAGLATAKKKDSTGICFIGEKNFKQFLSQYLPAQKGRMMTIDGRDMGEHAGL 242 

Query: 261 MYYTIGQRGGLGIGGQHGGDNKPWFWGKDLSKNILYVQQGFYHDSLMSTSLTASEIHFT 320 

MYYTIGQRGGLGIGGQHGGDN+PWFWGKDLS+NILYVGQGFYH++LMS SL AS IHFT 
Sbjct: 243 MYYTIGQRGGLGIGGQHGGDNQPWFWGKDLSQNILYVGQGFYHEALMSNSLDASVIHFT 302 

Query: 321 RDMPI^FKLECTAKFRYRQPDSKOTVYVKGNQARWFDDLQRAITPGQAWFYNEQECIjG 380 

R+MP EF ECTAKFRYRQPDS V V+V+G++A WF + QRAITPGQAWFY+ +ECLG 
Sbjct: 303 REMPEEFTFECTAKFRYRQPDSHVAVHVRGDKAEWFAEPQRAITPGQAWFYDGKECLG 362 

Query: 381 GGMIDQAYRDDKICQYI 397 

GGMID AY++ + CQYI 
Sbjct: 363 GGMIDMAYKNGQPCQYI 379 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1034 

A DNA sequence (GBSxll06) was identified in S.agalactiae <SEQ ID 3195> which encodes the amino 
acid sequence <SEQ ID 3196>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have a cleavable N-terra signal seg. 

INTEGRATj Likelihood =-12.84 Transmembrane 141 - 157 ( 134 - 165) 

INTEGRAL Likelihood =-11.78 Transmembrane 40 - 56 ( 36 - 73) 

INTEGRAL Likelihood = -4.35 Transmembrane 68 - 84 ( 65 - 86) 

INTEGRAL Likelihood = -3.50 Transmembrane 180 - 196 ( 175 - 199) 

Final Results 

bacterial membrane Certainty=0. 6137 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) <: suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB15390 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 71/202 (35%) , Positives = 120/202 (59%) , Gaps = 5/202 (2%) 

Query: 1 MISKFILAF^FFAIMNPISNLPAPMALVADDIDQKISRRIAA.K3VLIAFVIIVIFVLSGH 60 
5 M S + F++ FA+ NPI N+P F+ L + IA K +L+F 1+ F++ GH 

Sbjct: 2 MFSFIVHVFISLFAVSNPIGtWPIFLTLTEGYTAAERKAIARKAAILSFFILAAFLVFGH 61 

Query: 61 LLFNLFGITIAALKISGGILVGIIGYKMINGIHSPTNK-NLEEHKD--DPM1WAVSPLAM 117 
L+F LF I + AL+++GGI + I Y ++N S + +EHK+ + +++V+PL++ 

10 Sbjct: 62 LIFKLFDINIHALRVAGGIFIFGIAYNLLNAKESHVQSLHHDEHKESKEKADISVTPLSI 121 

Query: 118 PLIAGPGTIATAMGLSSG--GLSGKLITILAFAILCVIMYVILISANEITKFLGKNAMTI 175 

P++AGPGTIAT M LS+G G+ ++ A + + ++ +1+ LGK M + 

Sbjct: 122 PI1AGPGTIATVMSLSAGHSGIGHYAAVMIGIARVIALTFLFFHYSAFISSKLGKTEMNV 181 

15 

Query: 176 ITKMMGLILMTIGIEMLITGIK 197 

IT++MGLIL + + M+ G+K 
Sbjct: 182 ITRLMGLILAWAVGMIGAGLK 203 

20 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8715> and protein <SEQ ID 8716> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrira Score: 9.79 
25 GvH: Signal Score (-7.5): -1.53 

Possible site: 29 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 4 value: -12.84 threshold: 



INTEGRAL Likelihood 
INTEGRAL Likelihood =-1 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
PERIPHERAL Likelihood = 
modified ALOM score: 3.07 

*** Reasoning Step: 3 



Transmembrane 141 - 157 ( 134 - 

78 Transmembrane 40 - 56 ( 36 - 

35 Transmembrane 68 - 84 ( 65 - 

50 Transmembrane 180 - 196 ( 175 - 
27 110 



Final Results 

bacterial membrane Certainty=o .6137 (Affirmative) < suco 

40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00620(301 - 891 of 1209) 
45 OMNI|NT01BS3953 (11 - 212 of 220) conserved hypothetical protein 

%MatCh =15.8 

%Identity =35.5 %Similarity =61.5 

Matches = 71 Mismatches = 74 Conservative Sub.s = 52 

50 96 126 156 186 216 246 276 306 

VQLSSDIVm,TVKLQFT*KVIKQGLCLMIYNEQSHQVKLLFFIMNKOT 

I 

VQRLSTRRYMMF 



SKFIIJU?MAFFAIMOTISNLPAFMALVADDDQKISRRIAAKGvXjkAFVI 

I = h::||: III 1 = 1 hi = II I =hl h h= Ilhl II I : ||:::|||:: 

SFIVHVFISLFAVSNPIGOTPIFLTLTEGYTAAERfCAIARKMILSFFIIiAAFDVFGHLIFKLFDINIHALRvAGGIFIF 



IIGYKMINGIHSPTNK-NLEEHKD- -DPMNVAVSPLAMPLLAGPGTIATAMGLSSG- -GLSGKLITILAFAILCVIMYVT 
11-11 = :||h = > = « h I h = h » I III I I I I I Ihl h :: | : : :: 
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GIAYNLLNAKESHVQSLHHDEHKESKEKADISVTPLSIPIIAGPGTffiTVMSLSAGHSGIGHYAAVMI 

110 120 130 140 150 ISO 170 

801 831 851 891 921 951 981 1011 

5 LISAlffiITKFLGKNAMTIITKIWGLILMTIGIEMLITGIKIGFHXT*PIPSG*LLKDKC*NKFHXNyDGQSSWNL*VFLT 
: = I: III I :||::|llll '■ > 1= 1 = 1 
FHYSAFISSKLGKTEMNVITRLMGLILAWAVGMIGAGLKGMFPVLTS 
190 200 210 220 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1035 

A DNA sequence (GBSxll07) was identified in S.agalactiae <SEQ ID 3197> which encodes the amino 
acid sequence <SEQ ID 3198>. Analysis of this protein sequence reveals the following: 

15 Possible site: 17 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1747 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10289> which encodes amino acid sequence <SEQ ID 
1 0290> was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45494 GB.-U80409 glucose inhibited division protein homolog 
GidA [Lactococcus lactis subsp. cremoris] 
Identities = 394/524 (75%), Positives = 458/524 (87%), Gaps = 2/524 (0%) 

30 Query: 13 KTLIATINLEMLAFMPC^PSIG<3SAKGIVVF£IDALG<3EMGKNIDKTYIQMKMRTIGKGP 72 

KTLIj TINL M+AFMPOTPSIGGSAKGIVvREIDALGGEMG+NIDKTYIQMKMLNTGKGP 
Sbjct: 12 KTLLMTINUmVAFMPCNPSIGGSAKGIVWEIDALGGEMGRNIDKTYIQMKMLNTGKGP 71 



Query: 73 AV1^RAQADKALYAQTMKQTVEKQENLTDRQAMIDEILVEDGK--WGVRTATNQKFSA 13 0 
35 AVRALRAQADK YA +MK TV QENLTURQ M++E++++D K V+GVRT+T ++ A 

Sbjct: 72 AVRALRAQADKDEYA ^ lSMKNTVSDQENLTLRQG^^VEELILDDEKQKVIGVRTSTGTQYGA 131 

Query: 131 KSWITTGTALRGEIIIfiDLKYSSGPNNSIASVTLADNLRDLGLEIGRFKTGTPPRVKAS 190 
K+V+ITTGTALRGEII+G+LKYSSGPNNSL+S+ LADNLR++G EIGRFKTGTPPRV AS 

40 Sbjct: 132 KAVIITTGTALRGEI1IGELKYSSGPNNSLSSIGLADNLREIGFEIGRFKTGTPPRVLAS 191 



Query: 191 SINYEKTEIQPGDEQPNHFSFMSRDEDYITDQVPCTILTYTNTLSHDIINQNLHRAPMFSG 250 

SI+Y+KTEIQPGDE PNHFSFMS DEDY+ DQ+PCWLTYT SH 1+ NLHRAP+FSG 
Sbjct: 192 SrDYDKTEIQPGDEAPNHFSFMSSDEDYDKDQIPCTfLTYTTENSHTILRDNLHRAPLFSG 251 

Query: 251 IVKGVGPRYCPSIEDKIVRFADKERHQLFIiEPEGRYTEEVYVQGLSTSLPEDVQVDLIiRS 310 

IVKGVGPRYCPSIEDKI RFADK RHQLFLEPEGR TEEVY+ GLSTS+PEDVQ DL++S 
Sbjct: 252 IVKGVGPRYCPSIEDKITRFADKPRHQLFLEPEGRNTEEVYIGGLSTSMPEDVQFDLVKS 311 

Query: 311 IKGIiENAE^RTGYAIEYDIVLPHQIjSATLETK\ r IAGLFTAGQTNGTSGYEEAAGQGLVA 370 

I GLENA+MMR GYAIEYD+V+PHQLR TLETK+I+GLFTAGQTNGTSGYEEAAGQGLVA 
Sbjct: 312 IPGLENAKNMRPGYAIEYUVVMPHQLRPTLETKLISGIjFTAGQTNGTSGYEFAAGQGLVA 371 

Query: 371 GINAALKVCX3KPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDNADMR 430 

GINAALK+QGKPE ILKRS+AYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDMaD R 
Sbjct: 372 GINAALKIQGKPEFILKRSEAYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDNADRR 431 

Query: 431 LTEIGYEIGLVDEERYAIFKKRQMQFENELERLDSIKLKPVSETNKRIQELGFKPLTDAL 490 

LTEIG ++GLV + ++ ++ + QF+ E++RL+S KLKP+- +T +++ +LGF P+ DAL 
Sbjct: 432 LTEIGRQVGLVSDAQWEHYQAKMAQFDREMKRIBSEKIiKPLPDTQEKLGKIiGFGPIKDAlj 491 
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Query: 491 TAKEFMRRPQITYAVATDFVGCSDEPLDSKVIELLETEIKY3GY 534 

T EF++RP++ Y DF+G A E +D V EL+ETEI YEGY 
Sbjct: 492 TGAEFLKRPEVNYDEVIDFIGQAPEVIDRTVSELIETEITYEGY 535 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3199> which encodes the amino acid 
sequence <SEQ ID 3200>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1064 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 530/610 (86%) , Positives = 574/610 (93%) 

Query: 1 MEASraAASRMGCKTLIATINLEMrAFMPCNPSIGGSAKGI\A«EIDALGGEMGKNrDKTY 60 

+EASIA SHMGCKTLLATINL+MLAFMPCNPS1GGSAKGIWREIDALGGEMGKNIDKTY 
Sbjct: 21 VEASLATSRMGCKTLLATINLDMLAFMPCNPSIGGSAKGIWREIDALGGEMGKNIDKTY 80 

Query: 61 IQMKMLOTGKGPAVEALPAQADKALYAQTMKQTVEKQENLTLRQAMIDEILVEDGKWGV 120 

IQMKMLNTGKGPAVRALRAQADK+LYA+ MK TVEKQ NLTLRQ MID+ILVEDG+WGV 
Sbjct: 81 IQMKMLNTGKGPAVRALRAQADKSLYAREMKHTVEKQANLTLRQTMIDDILVEDGRVVGV 140 

Query: 121 RTATNQKFSAKSWITTGTALRGEIILGDLKYSSGPNNSIASVTIADNLRDLGLEIGRFK 180 

TAT QKF+AK+W+TTGTALRGEIILG+LKYSSGPNNSIASVTIADNL+ LGLEIGRFK 
Sbjct: 141 LTATGQKFAAKAWVTTGTALRGEI ILGELKYS SGPNW S IAS VTLADNLKKLGIiE IGRFK 200 



TGTPPRVKAS S 1NY+ +TEIQPGD+ + PNHFSFMS+D DY+ DQ+PCWLTYTN SHDIINQ 
Sbjct: 201 TGTPPRVKASS INYDQTEIQPGDDKPNHFSFMSKDADYLKDQI PCWLTYTNQTSHDI INQ 260 

Query: 241 NLHRAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGRYTEEVYVQGLSTSLP 300 

NL+RAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGR TEEVYVQGDSTSLP 
Sbjct: 261 NLYRAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGRDTEEVYVQGLSTSLP 320 

Query: 301 EDVQVDLLRSIKGLENAEMMRTGYAIEYDIVLPHQLRATLETKVIAGLFTAGQTNGTSGY 360 

EDVQ DL+ SIKGLE AEMMRTGYAIEYDIVLPHQLRATLETK+I+GLFTAGQTNGTSGY 
Sbjct: 321 EDVQKDLIHSIKGLEKAEMMRTGYAIEYDrVLPHQLRATLETKLISGLFTAGQTNGTSGY 380 

Query: 361 EEAAGCGLVAGIHAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSxRAEYRL 420 

EEAAGQGL+AGINAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRL 
Sbjct: 381 EEAAGQGLIAGINAALKVQGKPELIIjKRSDAYIGVMiDDLVTKGTLEPYRLLTSRAEYRL 440 



Query: 481 LGFKPLTDALTAKEFMRRPQITYAVATDFVGCADEPLDSKVIELLETEIKYEGYIKKALD 540 

LGFKPLTDA+TAKEFMRRP+I YA A FVG A E LD+K+IELLETEIKYEGYI+KALD 
Sbjct: 501 LGFKPLTDAMTAKEFMRRPEIDYATAVSFVGPAAEDLDAKIIELLETEIKYEGYIRKALD 560 

Query: 541 QVAKMKRMEEKRIPPHIDWDDIDSIATEARQKFKKINPETLGQASRISGVIIPADISILMV 600 

QVAKMKRMEEKRIP +IDWD IDSIATEARQKFKKINPET+GQASRISGVNPADISILM+ 
Sbjct: 561 QVAKMKRMEEKRIPTNIDWDAIDSIATEARQKFKKiNPETIGQASRISGVNPADISILMI 620 

Query: 601 YLEGRQKGRK 610 

YLEG K + 
Sbjct: 621 YIjEGNGKAHR 630 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1036 

A DNA sequence (GBSxll08) was identified in S.agalactiae <SEQ ID 3201> which encodes the amino 
acid sequence <SEQ ID 3202>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 {Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07750 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 205/644 (31%), Positives = 362/644 (55%), Gaps = 28/544 (4%) 

Query: 35 LLLAI FVALS FWALLYYQ KITYELSEVEQIELLNDQTE 73 

++ + VAL F++AL +YQ +I++E + I L+ + 

Sbjct: 14 VIALIAVALVFLIALSFYQWQLGVlGVLLLIiVTAIFSLRARISFERDLEQYISTLSYRVH 73 

Query: 74 VSLKSLLEQMPVGVIQFDLETNDIEWFNPYA-ELIFTGDNGHFQSATVKDIITSRRNGTA 132 

+ + + Q+PVG+I ++ + ++W NPYA E + + +++ + GT 

Sbjct: 74 KAGEEAVTQLPVGMILYNDQLR-VQWTOPYAAEHLPKAEIDASLEELSPELVRALEEGTD 132 

Query: 133 GQSFEYGDNKYSAYLDTETGVFYFFDMFMGNRRNYDSSMLRPVIGIISIDNYDDIMDTML 192 

Q + Y + YFFD R + +PV+ 1 +DNYD++ M 

Sbjct: 133 EQKIVIEEKTYDCTFKPNERLIYFFD1TES2RMHQQFEESQPVLTFIYLDNYDEVTQGME 192 

Query: 193 EADMSKINAFVTSFISDFTQS^FYRRVNMDRYYIF^ 252 

+ s++ + VTS ++ + ++F RR DR+ Y L + K KF IL+E R+ 

Sbjct: 193 DQTOSPJLMSQOTSSMQWANEHDLFLRRTAADRFIAVMSYGSLLAIEKTKFGILDEIRET 252 

Query: 253 AQEmLSLTLSMGISYGDGNHNQIGQIALENl^ABvRGGDQIVTOENDSSKKALYFGGG 312 

+ + LTLS+G+ YGD + ++GQ+A +L+ AL RGGDQ+ +++ K ++GG 

Sbjct: 253 TGKEKIPLTLSIGVGYGDLSLRELGQIiAQSSLDLALGRGGDQVAIKQRTG- - KVRFYGGK 310 

Query: 313 AVSTIKRSRTRTRAMMTAISDRLKVVDSVFIVGHRKLDMDALGASVGMQFFASNIVNASY 372 

+ + KR+R RR-HA+D+ DV ++GH+ DMDA+GA++G+ A ++ 
Sbjct: 311 SNAMEKRTRVRARVISHAI.RDFVLESDRVIVMGHKNPDMDAVGAAIGILKIAEVNDREAF 370 

Query: 373 WYDPNDMNSDIERAIDYLQEDGET- -RLVSVERAFELITQNSLLVMVDHSKTALTLSKE 430 

W DPND+N D+ + ++ ++++ + + ++ E + EL+T+ +LLV+VD K ++ + 
Sbjct: 371 VVLDPNDVNPDVSKLMEEVEKNEQLWDKFITPEESLELMTEETLLVIVDTHKPSMVIEPR 430 

Query: 431 FFNKFADVIVVDHHRRDEDFPKNAVLSFIESGASSASELVTELIQFQQAKDKLSRSQASI 490 

+ V+V+DHHRR E+F ++ VL ++E ASS +ELVTEL+++Q K K+ +++ 
Sbjct: 431 LLDYVERVVVLDHHRRGEEFIEDPVliVYMEPYASSTAELVTELLEYQPKKLKMDILESTA 490 

Query: 491 U1AGIMLDTRNFASNVTSRTFDVASYLRGLGSNSMAIQKISATDFDEYRLINELILKGER 550 

L+AG+++DT++FA +RTFD AS+LR G++++ +QK+ D + Y +L+ + 
Sbjct: 491 LIAGMIvUTKSFAIRTGARTFDAASFLRSHGADTvTjVQKLLKEDLNHYVKPJUCLVETAKL 550 

Query: 551 IYDNIIVATGEEHKOTSHVIASKAADTMLTMAGIEATFVITKNSSN-IGISARSRNNINV 609 

D + +AT E + S ++ ++AADT+LTM G+ A+FVT++ + 1SARS ++NV 

Sbjct: 551 YRDGiMAIATAREEEAVSQLLIAQAADTLLTMKGWASFVISRRHDGVVSISARSLGDVNV 610 



Query: 610 C 

Q IME L GGGH + AA Q +D ++++ L E 1D+ L S 
Sbjct: 611 QLIMESLDGGGHLTNAATQFEDATLEEAEAKLKEAIDQYLEGGS 654 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3203> which encodes the amino acid 
sequence <SEQ ID 3204>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
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>>> Seems to have an uncleavable H-term signal seq 

INTEGRAL Likelihood =-18.57 Transmembrane 33 - 49 ( 6 - 56) 
INTEGRAL Likelihood =-10.14 Transmembrane 12 - 28 ( 6 - 32) 

5 Final Results 

bacterial membrane Certainty^O. 8429 (Affirmative) < sugo 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

1 0 The protein has homology with the following sequences in the databases: 

>GP:BAB07750 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 199/559 (30%) , Positives = 367/659 (55%) , Gaps = 16/659 (2%) 

MKKF RFETIHLI-MMGLILFGLLALCVSIMQSKILILLAIPLVLLFW-ALLWYQKE 55 

M KF R+ H+I ++ + L L+AL Q ++ +L + ++ +F + A + ++++ 

MPKFLLIOIWHGYHVIALLAVALVFLIALSFYQWQLGVIGVLLLLVIAIFSLRARISFERD SO 



Query: 


1 


Sbjct: 




Query.- 


56 


Sbjct: 


61 


Query: 


115 


Sbjct: 


115 


Query: 


175 


Sbjct: 


175 


Query: 


235 


Sbjct: 


235 


Query: 


295 


Sbjct: 


295 


Query: 


355 


Sbjct: 


353 


Query: 


413 


Sbjct: 


413 




473 


Sbjct: 


473 


Query: 


533 


Sbjct: 


533 




593 


Sbjct: 


593 



V+ I +DNYD++T + D S++ S V + ++++ +F RR DR+ 



KF +L+E R+ + PLTLSIG+ +G+ + ++GQ+A +L++AL RGGDQ 



D V ++GH+ DMDA+ 



+F V +P++++PD+ - 



+LLV+VD K S+ + + + V+V+DHHRR ++F ++ +L ++E ASS AELVTE 



r L+AG+++DTK+F+ R +RTFD AS+LRS G+D+V +Q -i 



t* ++AADT+L+M V ASFV+ 



++ISARS +NVQ +ME L GGGH AA Q D +L +A+ L + 14 ++ 
;WSISARSLGDVNVQLIM3SLDGGC-ELTNAATQFEDATLEEAEAKLKEAIDQYLE 651 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 428/658 (65%), Positives = 547/658 (83%), Gaps = 1/658 (0%) 

Query: 1 MKRFRFATVHLVLIGLILFGLLAICVRLFQSYTALLlAIFVmSFWALLYYQKITYELS 60 

MK+FRF T+HL+++GLILFGLLA+CV 4 QS +LLAIF+ L FWALL+YQK Y+LS 
Sbjct: 1 MKKFRFETIHLI^LILFGLIALCVSIMQSKILILLAIFLVLLFWALLWYQKEAYQLS SO 

Query: 61 EVEQIELLNDQTEVSLKSLLEQMPVGVIQFDLETNDIEWFNPYAELIFTGDNGHFQSATV 120 
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Query: 121 KDIITSRRNGTAGQSFEYGDNKYSAYLDTETGVFYFFDNFMGNRRim)SSMLRPVIGIIS 180 

+ I IT +R Q+FE NKY++Y+D +G+FYFFD+F+GNR+ D+SMLRPV+GIIS 

Sbjct: 121 QQIITEKRREDISQTFEVSGNKYTSYIDVSSGIFYFFDSFVGNRQLRDASMLRPWGIIS 180 

Query: 181 IDNYDDI^TMLFADMSKINAFVTSFISDFTQSKNIFYRRVNMDRYYIFTDYSVIOTLIK 240 

+DNYDDI D + +AD SKIN+FV +FI +F +SK IFYRRVNMDRYY FTD+ EN L+ 
Sbjct: 181 VDOTDDITDDLSDJfflTSKINSFVANFIDEF^SKRIFYRRVN^RYYFFTDFKTRIDLMD 240 

Query: 241 DKFDIIJffiFRKRAQESFHLSLTI^MGISYGDGNHNQIGQIALENIiNTALVRGGDQIVVREN 300 

+KF +L EFRK AQ+ LTLS+GIS+G+ NH+QIGQ+ALENLN ALVRGGDQIV+REN 
Sbjct: 241 NKFSA7LEEFRKEAQDAQRPLTLSIG1SFGEENHSQIGQVALENLNIALVRGGDQIVIREN 300 

Query: 301 DSSKKALYFGGGAVSTIKRSRTRTRAMMTAISDRLKVl'DSVFIVGHRKLDMDALGASVGM 360 

+YFGGG+VST+KRSRTRTRAMMTAISDR+K+VD+VFIVGHRKLDMDALG++VGM 
Sbjct: 301 ADHTNPIYFGGGSVSTVKRSRTRTRAWITAISDRIIWDNVFIVGHRKLDMDALGSAVGM 360 

Query: 361 QFFASNIVNASYWYDPNDMNSDIERAIDYLQEDGETRLVSVERAFELITQMSLLVMVDH 420 

QFFA NI+ £+ VY+P++M+ DIERAI+ LQ DG+TRL+SV +A L+T SLLVMVDH 
Sbjct: 361 QFFAGNIIENSFAVYWPDEMSPDIERAIERLQADGKTRLISVSQAMGLVTPRSLLVMVDH 420 

Query: 421 SKTALTLSKEFFNKFADVIVVDHHRRDEDFPKNAVLSFIESGASSASELVTELIQFQQAK 480 

SK +LTLSKEF+ +F +VIWDHHRRD+DFP NA+L+FIESGASSA+ELVTELIQFQ AK 
Sbjct: 421 SKISLTLSKEFYEQFQMVIWDHHRRDDDFPDNAILTFIESGASSAAELVTELIQFQNAK 480 

Query: 481 DICLSRSQAS1LMAGIMLDTRNFASNVTSRTFDVASYLRGLGSN3MA.IQKISATDFDEYRL 540 

L++ QAS+LMAGIMLDT+NF++ VTSRTFDVASYLR GS+S+ IQ ISATDF+EY+ 
Sbjct: 481 KCLNKIQASVLMAGIMLDTKNFSTRVTSRTFDVASYLRSKGSDSVEIQNISATDFEEYKQ 540 

Query: 541 INELILKGERI YDNI IVATGEEHKVYSHVIASKAADTMLTMAGIEATFVITKNSSN- IGI 599 

INE+IL+GER+ D+IIVA GE++ +YS+VIASKAADT+L+MA +EA+FV+ + +S+ I I 
Sbjct: 541 IHEIILQGERLGDSIIVA^GEKKHLYSNVIASKAADTILSMAHVEASFVLVETASHKIAI 600 

Query: 600 SARSRNNIIWQRI^KLGGGGHFSFAACXJIQDKSVKQTORMLLEIIDEDLRENSTVEN 657 

SARSR+ INVQR+MEKLGGGGHF+ AACQ+ D S+ Q + +LL+ 1+ ++E VE+ 
Sbjct: 601 SARSRSKirWQRVMEKLGGGGHFNLAACQLTDISLPQAKYLLLKTINMTMKETGEVES 658 

A related GBS gene <SEQ ID 8717> and protein <SEQ ID 871 8> were also identified. Analysis of th 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 

McG: Discrim Score: 13.82 

GvH: Signal Score (-7.5): -0.890001 

Possible site: 44 
>>> Seems to have a cleavable N-terra signal seq. 
ALOM program count: 0 value: 2.97 threshold: 0.0 
PERIPHERAL Likelihood = 2.97 574 
modified ALOM score: -1.09 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

31.3/55.8% over 631aa 
Bacillus subtilis 

EGAD | 19304 | hypothetical 74.3 kd protein in rpli-cotf intergenic region Insert 
characterized 

SP|P3 7484|YYBT_BACSO HYPOTHETICAL 74.3 KDA PROTEIN IN RPLI-COTF INTERGENIC REGION. Insert 
characterized 

GP|467336 |dbj |B7AA05182 .l| |D26185 unknown Insert characterized 
Gpj 2636598 |emb|CAB16088.lj |Z99124 yybT Insert characterized 
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PIR|S65976|S6S976 yybT protein - Insert characterized 
ORF00251 (364 - 2241 of 2580) 

EGAD| 19304|BS4045(20 - 651 of 659) hypothetical 74.3 kd protein in rpli-cotf intergenic 
5 region {Bacillus subtilis}SP| P37484 | YYBT_BAC 

SU HYPOTHETICAL 74.3 KDA PROTEIN IN RPLI-COTF INTERGENIC 

REGION. GP 1 467336 | dbj |BAA05182.l| |D26185 unknown {Bacillus subtilis}GP | 26365 

98|erab|CAB16088.l| | Z99124 yybT {Bacillus subtilis } PIR | S65976 | S65976 yybT protein - Bacillus 
subtilis 
10 %Match =18.5 

%Identity = 31.2 %Similarity = 55.8 

Matches = 197 Mismatches = 271 Conservative Sub.s = 155 



t *CSPLFIRGVLCYN*vLRGYLMKRFRFATVHLVLIGLILFGLI^ICWLFQSYTALLIjAIFVALSFWALLYYQKIT 
I I : : |:| | | =| | : | |:: =: 

MPSFYEKPLFRYPIYALIALSIITILISFYFNWILGTVEVLLLAVILFFIECRAD 



YEL-SEVEQ-IELLNDQTEVSLKSLLEQMPVGVIQFDLETNDIEWFNPYAELIFTGDN- -GHFQSATVKDIITSRRM3TA 

: |:: | |: : : : | ,||:|:, |: : ||| ||: | | | : :: : 

SLIRQEIDAYISTLSYRLKKVGEEALMEMPIGIMLFN-DQYYIEWANPFLSSCFNESTLVGRSLYDTCESWPLIKQEVE 



GQSFEYGDNKYSAYLDTETGVFYFFDNFMGNRRNYDSSMLRPVIGIISIDNYDDIMDTMLEADMSKINAXVTSFXSDFTQ 
lh=: ::|lll : I M I = I I 1 I I = = = I =h 111= : : I 

SETVTIiMRKFRWIKRDERLLYFFDVTEQIQIEXLYENERT^ 



966 996 1026 1056 1086 1116 1146 1176 

SKNIFYRRWn^DRYYIFTDYSVLNTLIKDKFDILNEFRKRAQENHL^ 

II =1 i = M : : I I II 11 = 1 M = = :>||||>|i = : = l :| =1= II III 

EYGIFLKRTSSERFIAVUJEIIILTELENSKFSILDEVREKTSFDGVALTLSVGVGASVSSLI03LGDLAQSSLDLALGRGG 
230 240 250 260 270 280 290 



1206 1236 1266 1296 1326 1356 1386 1416 

DQIVTOEIfflSSKKALYFGGGAVSTIKRSRTRTRAMMTAISDRLK^AroSVFIVGHRKLDMDALGASVGMQFFASNIVNASY 
40 ||: :: : | ::|| ||:| | | : |: : : :| |:||: |||::||::|: | 

DQVAIKLPNGKVTC- -FYGGKTNPMEKRTRVRARVISHALKEIVTESSNVI IMGHKFPDMDSIGAAIGILKVAQANNKDGF 
310 320 330 340 350 360 370 



1446 1476 1500 1530 1560 1590 1620 1650 

45 WYDPNDMNSDIERAIDYLQEDGE- -TRLVSVERAFELITQNSLLVMVDHSKTALTLSKEFFNKFADVIWDHHRRDEDF 

:| III : I : : I I = = = I =1 = = : I I 1= ::|lh|| | :| : : : || ::|:||||| |:| 
IVIDPNQIGSSVQRLIGEIKKYEELWSRFITPEEAMEISNDD-rLLVIVDTHKPSLVMEERLVNKIEHIWIDHHRRGEEF 
390 400 410 420 430 440 450 

50 1680 1710 1740 1770 1800 1830 1860 1890 

PKNAvLSFIESGASSASELVTELIQFQQAKDKLSRSQASILMAGIMLDTRNFASNVTSRTFDVASYLRGLGSNSMAIQKI 

= : =1 = = l III :||||||:::| = H = =1= I = I I I = = I I = = I = Hill M l |:::: :|| 

IRDPLLVYMEPYASSTAELVTELLEYQPKRLKINMIEATALIAGIIVDTKSFSLRTGSRTFDAASYLRAKGADTVLVQKF 
470 480 490 500 510 520 530 

55 

1920 1950 2004 2034 2064 2091 2121 

SATDFDEYRLINELILKGERIYDNIIVAT- -GEEHKVYSHVIASKAADTMLTMAGIEATFVITK-NSSNIGISARSRNNI 
I I =11 111 =|: I = = =1= -llh = | = l = =11 = 1 = = = = I I I I I = 

LKETVDSYIKRAKLIQHTVLYKDNIAIASLPFJSIEEEYFDQVLIAQAM 
60 550 560 570 580 590 600 610 



2151 2181 2211 2241 2271 2301 2331 2361 

]WQRIMEKLGGGGHFSFAACQIQDKSVKQVRPJ4LLEIIDEDLRENSTVENRFX)*LR*KLFFYKMLRGKEKKVRLRKYLLV 
III III I MM:: II I:M= I Ml = 
NVQIIMEALEGGGHLTNAATQLSGISVSEALERLKHAIDEYFEGGVQR 
630 640 650 
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SEQ ID 8718 (GBS10) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 6; MW 98kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 7; MW 73kDa). 

The GST-fusion protein was purified as shown in Figure 189, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1037 

A DNA sequence (GBSxll09) was identified in S.agalactiae <SEQ ID 3205> which encodes the amino 
acid sequence <SEQ ID 3206>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4643 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA43972 GB:X62002 ribosomal protein L9 [Bacillus 
stearothermophilus] 
Identities = 80/149 (53%), Positives = 105/149 (69%), Gaps = 2/149 (1%) 

Query: 1 MEWIFLQDVKGKGiaCGEVKEVPTGyAQNFLLKKNIAKEATTQAIGELKGKQKSEEKAQAE SO 
MKVIFL+DVKGKGKKGE+K V GYA NFL K+ LA EAT + L+ +++ E++ AE 

Query: 61 ILAQAKELKTQLESETTRVQFIEKVGPDGRTFGSITAKKIAEELQKQYGIKIDKRHIDLD 120 

LA AK+LK QtiE T ■ + K G GR FGSIT+K+IAE LQ Q+G+K+DKR I+L 
Sbjct: 61 ELANAKKLKEQLEKLTVTI P - -AKAGEGGRLFGS I TSKQIAESLQAQHGLKLDKRKIELA 118 

Query: 121 HTIRAIGKVEVPVKLHKQVSSQIKLDIKE 149 

IRA4G VPVKLH +V++ +K+ + E 
Sbjct: 119 DAIRALGYTNVPVKLHPEVTATLKVHVTE 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3207> which encodes the amino acid 
sequence <SEQ ID 3208>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4630 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 119/150 (79%) , Positives = 138/150 (91%) 

^ MKVIFL DTOGKGKKGE+KEVPTGVAQNFL+KKlilLAKEAT+Q+IGELKGKQK+EEKAQAE 

Sbjct: 1 MKVIF^VTCGKGKKGEIKEVPTGYAQNFLIKJCNIAKEATSQSIGELKGKQKAEEKAQAE 60 

Query: 61 ILAQAKELKTQLESETTRVQFIEKVGPDGRTFGSITAKKIAEELQKQYGIKIDKRHIDLD 120 

ILA+A+ +K L+ + TRVQF EKVGPDGRTFGS I TAKKI+EELQKQ+G+K+DKRHT LD 
Sbjct: 61 IIJ^QAVKAVLDEDKTRVQFQJ^GPDGRTFGSITAKKISEELQKQFGVKVDKRHIVIjD 120 
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Query: 121 HTIRAIGKVEVPVKLHKQVSSQIKLDIKEA ISO 

H IRAIG +EVPVKLHK+V+++IKL I Eft. 
Sbjct: 121 HPIRAIGLIEVPVK1HKEVTAEIKLAITEA 150 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1038 

A DNA sequence (GBSxlllO) was identified in S.agalactiae <SEQ ID 3209> which encodes the amino 
acid sequence <SEQ ID 3210>. This protein is predicted to be DNA polymerase III delta prime subunit 
(dnaB). Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -0.43 Transmembrane 204 - 220 ( 204 - 220) 

Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2423> which encodes the amino acid 
sequence <SEQ ID 2424>. Analysis of this protein sequence reveals the following: 
Possible site: 21 

>» Seems to have no N-terminal signal sequence 

integral Likelihood = -0.27 Transmembrane 210 - 226 ( 210 - 226) 

Final Results 

bacterial membrane Certainty=0 .1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 397/450 (88%), Positives = 431/450 (95%), Gaps = 1/450 (D%) 

Query: 3 EVSELRVQPQDLLAEQAVLGSIFISPEKLIMVREFISPDDFYKYSHKVIFRAMITLADRN 62 

EV+ELRVQPQDLLAEQ+VLGSIFISP+KLI VREFISPDDFYKY+HK+IFRAMITL+DRN 
Sbjct: 8 EVAELRVQPQDLLAEQS VLGS I FI S PDKL I AVREFISPDDFYKYAHKI I FRAMITLSDRN 67 

Query: 63 DAIDAATVRNIIiDDQGDLQNIGGLGYIVELVNSVPTSANAEFYAKIVSEKAMLRDIISKlj 122 

DAIDA T+R ILDDQ DLQ+IGGL YIVELVNSVPTSANAE+YAKIV+EKAMLRDII++L 
Sbjct: 68 DAIDATTIRTILDDQDDLQSIGGLSYIVELVNSVPTSANAEYYAKIVAEKAMLRDIIARL 127 



Query: 182 OTGLPTGFRDLDRITTGLHPDQLIILAARPAVGKTAFVLNIAQNVGTKQNRPVAIFSLEM 241 

VTGLPTGFRDLD+ITTGLHPDQL+ILAARPAVGKTAFVLNIAQNVGTKQ + VAIFSLEM 
Sbjct: 188 VTGLPTGFRDLDKITTGLHPDQLVILAAR?AV3KTAF\1iNIAQNVGTKQKKTVAIFSLEM 247 

Query: 242 GAESLVDRMLAAEG^IVDSHSLRTGQLTDQDWNNVTIAQGALADAPIYIDDTPGIKITEIR 301 

GAESLVDRMLAAEGtTOISHSLRTGQLTDQDWNNVTIACGALA+APIYIDDTPGIKITEIR 
Sbjct: 248 GAESLVDRMIJAAEG^ra3SHSLRTGQLTDQDVS^SINVTIAQGAIlAEAPIYIDDTPGIKITEIR 307 

Query: 302 ARSRKLSQEVDDGLGLIVIDYLQLISGTRPENRQQEVSEISRQLKILAKELKVPVIALSQ 361 

ARSRKLSQEVD GLGLI VIDYLQLI +GT+ PENRQQEVS+ 1 SRQLKILAKELKVPVIALSQ 
Sbjct: 308 ARSRKLSQEVDGGLGLIVIDYLQLITGTKPENRQQEVSDISRQLKILAKELKVPVIALSQ 367 



Query: 362 LSRGVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYRREGEEAEEI VEDNTVEVIL 421 
LSRGVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYR+E ++AEE VEDNT+EVIL 
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Sbjct: 368 LSRGVEQRQDKRPVLSDIRESGSIEQD?J3IVA7LYRDDYYRKECDDAEEAVEDNTIEVIL 427 
Query: 422 EKNRAGARGTVKLMFQKEYNKFSSIAQFEE 451 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1039 

A DNA sequence (GBSxllll) was identified in S.agalactiae <SEQ ID 321 1> which encodes the amino 
acid sequence <SEQ ID 3212>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4909 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3213> which encodes the amino acid 
sequence <SEQ ID 3214>. Analysis of this protein sequence reveals the following: 

> N- terminal signal sequence 

■ Final Results 

bacterial cytoplasm — certainty=0 .3467 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/90 (85%) , Positives = 84/90 (92%) 

Query: 1 MSDAFADVAKMKKIKEDIKSHEGQNVELTLENGRKREKNKIGEILIEVYPSLFIVEYKDTA 60 

MSDAF DVAKMKKIKEDI++HEGQ+VELTLENGRKREKNKIGRDIEVY SLFI+EY D++ 
Sbjct: 11 MSDAFTDVAKMKKIKEDIRAHEGQLVELTLENGRKREKNKIGRLIEVySSLFIIEYSDSS 70 

Query: 61 AVPGAIDNTYVESYTYSDILTEKTLIRYFD 90 i 

PGAIDN+ YVES YTYSDILTEKTL I RY D 
Sbjct: 71 DTPGAIDNSYVESYTYSDILTEKTLIRYLD 100 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1040 

A DNA sequence (GBSxlll2) was identified in S.agalactiae <SEQ ID 3215> which encodes the amino 
acid sequence <SEQ ID 321 6>. This protein is predicted to be 30S ribosomal protein S4 (rpsD). Analysis of 
this protein sequence reveals the following: 
Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2937 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <. suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00397 GB:AF008220 ribosomal protein S4 [Bacillus subtilis] 
Identities = 138/201 (68%), Positives = 158/201 (77%), Gaps = 1/201 (0%) 

Query: 1 MSRYTGPSWKQSRRLGLSLTGTGKELARRNYVPGQHGPffiTOSKLSEYGLQIiAEKQKLRFS 60 

M+RYTGPSWK SRRLG+SL+GTGKEL +R Y PG HGP R KLSEYGLQL EKQKLR 
Sbjct: 1 MARYTGPSWKLSRRLGISLSGTGKELEKRPYAPGPHGPGQRKKLSEYGLQLQEKQKLRHM 6 0 

Query: 61 YGLGEKQFRNLWQATKAICEGTLGFHFMVIJjEMUiDNvVYRLGIATTRRQARQPVNHGHI 120 

YG+ E+QFR LF +A K G G NFM+IjIh- RLDNWY+LGLA TRRQARQ VNHGHI 
Sbjct: 61 YGVlffiRQFRTLFDKAGKI^-GKHGENFMILII)SRLDNvWKlGLARTRRQARQLVNHGHl 119 

Query: 121 LVDGKRVDIPSYRVTPGQVISVREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 180 

LVDG RVDIPSY V PGQ I VREKS + I E+VE P 4-++FDAEKLEG+ TRL 

Sbjct: 120 LVDGSRvDIPSYLWPGCTIGWEKSRNLSIIKESVEVNNFVPEYLTFDAEKLEGTFTRL 179 

Query: 181 PERDE INPE INEALWEFYNK 201 

PER E+ PEINEAL+VEFY+4 
Sbjct: 180 PERSELAPE INEAL I VEFYSR 200 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3217> which encodes the amino acid 
sequence <SEQ ID 3218>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2937 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/203 (99%), Positives = 201/203 (99%) 

Query: 1 MSRYTGPSWKQSRRLGDSLTGTGKEIARRNWPGQHGPNITOSKI,SEYGLQI J AEKQKLRFS 60 

MSRYTGPSWKQSRRLGLSLTGTGKELARRNYVPGQHGPNNRSICLSEYGLQLAEKQKLRFS 
Sbjct: 1 MSRYTGPSWKQSRRLGLSLTGTGKEIARRNYVPGQHGPNNRSKLSEYGLQLAEKQKLRFS 60 

Query: 61 YGLGEKQFROT I FVQATKAKEGTLGFNF^WLLF^LDNvvYRLGLATTRRQARQFVNHGHI 120 

YGLGEKQFRNLFVQATK KEGTLGFNFMVLLERRLDNWYRLGLATTRRQARQFVNHGHI 
Sbjct: 61 YGLGEKQFRNLFVQATKIKEGTLGFNFI^HjLERRLDNVVYRIjGIATTRRQARQFVNHGHI 120 

Query: 121 LVDGKRVDIPSYRVTPGQVISvREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 180 

LVDGKRVDIPSYRV pgqvisvreksmkvpaileaveatlgrpafvsfdaeklegsltrl 

Sbjct: 121 LVDGKRVDIPSYRVDPGQVISVREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 180 

Query: 181 PERDEINPEINEALWEFYNKML 203 

PERDEINPEINEALWEFYNKML 
Sbjct: 181 PERDEINPEINEALWEFYNKML 203 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1041 

A DNA sequence (GBSxlll3) was identified in S.agalactiae <SEQ ID 3219> which encodes the amino 
acid sequence <SEQ ID 3220>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98302 GB:AF243383 unknown; 0rf3 [Lactococcus lactis subsp. 
lactis] 

Identities = 46/97 (47%) , Positives = 69/97 (70%) 

10 

Query: 1 MNLTORLKIEEMEEKyDSFKPRINiU^WAIDDFQKHYEDYVKLREFYGEEDWFRLSEQTE 60 

M+ D I++ME KYD+F P + F Y +Y++LR FYGSE WF E + 

Sbjct: 1 MDNKDIELIQQMENK^DTFMPvX,TNLIDSVEKFNSIYNNYIELRNFYGSEKWFEYMEIEK 6 0 

15 Query: 61 HNLKCGVLSEDQLFDFIGEHNELVGQFLDMSSQMYRH 97 

+KCGVL+EDQLFD I +HNEL+G LD++S+MY++ 
Sbjct: 61 IPVKCGVLTEDQLFDMISDHNELLGVLLDLTSKMYKM 97 

A related DNA sequence was identified in S.pyogenes <SEQ ID 322 1> which encodes the amino acid 
20 sequence <SEQ ID 3222>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3465 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 43/98 (48%) , Positives = 74/98 (74%) 
Query: 1 MNLNDRLKIEEMEEKYDSFKPRINALVEAIDDFQKHYEDYVKLREFYGSEDWFRLSEQTE 60 
Sbjct: 

Query: 61 NNLKCGVLSEDQLFDFIGEHNELVGQFLDMSSQMYRHL 98 

+++ CGVLSED LFD IG+HN+L+ LD++ MY+H+ 
Sbjct: 61 DDIPCGVLSEDLLFDMIGDHNQLLADILDLAPIMYKHM 98 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1042 

A DNA sequence (GBSxllH) was identified in S.agalactiae <SEQ ID 3223> which encodes the amino 
acid sequence <SEQ ID 3224>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0965 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04438 GB:AP001509 transcriptional regulator (TetR/AcrR 
family) [Bacillus halodurans] 
Identities = 47/181 (25%), Positives = 95/181 (51%), Gaps = 16/181 (8%) 

Query: 4 DTRREKTKRAIEAAMITLLKDQSFDEISTIlvXTKTAGISRSSFYTHYKDKYEMIDQYQQS 63 
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Query: 64 LFNKV-EYIFDRNQFKKEDAL LEIFQFLDRESLPAALLTQNGTKEIQTYILNKLQ 117 

+ + E + N K E+AL L ++ +RES L ++ G Q K 
Sbjct: 66 IIKDLSEALSSYKYTKDEEALQMTENLLVYIANNRESC-QTDFSEYGDPSFQ KKV 119 

Query: 118 LMLSKELPWNP DATKSDINHLYYSWLSHMFGVYQMWITRGKKESPQQITQVLLSL 175 

+ML+ + + P TK DI+ Y S+Y+ + + Q W+ G K+SP+++ ++++ L 

Sbjct: 120 MMU^HVIKTPLVGKHTKPDISE-YVSLYIVNGSIHIVQSWLKNGLKQSPKEMAELIIKL 179 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3225> which encodes the amino acid 

sequence <SEQ ID 3226>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BABQ4438 GB:AP001509 transcriptional regulator (TetR/AcrR 
family) [Bacillus halodurans] 
Identities = 47/180 (26%) , Positives = 88/180 (48%) , Gaps = 13/180 (10%) 

Query: 4 RKENTKQAILKA^WMLLKTESFDDITTVKLSKRAGISRSSFYTHYKDKYEMIDYYQQTFF 63 

RK+- T+ + ++++ L++ + +IT ++ A I+RS+FY+HY D Y+++ + 
Sbjct: 8 RKKYTRMLLKESLMKLMQEKPLSNITIKEICDLADINRSTFYSHYTDLYDLLYQIEDEII 67 

--QREQLLSSLLSANGTKEIQAFIINKVRLL- 117 
+ +L S G Q KV +L 

Sbjct: 68 KDLSEALSSYNYTKDEEALQMTENLIiVYIANNRESCQTLFSEYGDPSFQ KKVMMLA 123 

Query: 118 ITTDLQDKFSTEELSQTEKEYQSIYLAHAFFGVCQSWIAKGKKESPQEMTQFVLKM 173 

I T Ii K + ++S EY S+Y+ + + QSW+ G K+SP+EM + ++K+ 
Sbjct: 124 HDHVIKTPLVGKHTKPDIS EYVSIiYIVNGSIHlVQSVJLKNGLKQSPKEMAELIIKL 179 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 100/179 (55%), Positives = 134/179 (73%), Gaps = 2/179 (1%) 

Query: 1 MVNDTRREKTKRAIEAAMITLLKDQSFDEISTINLTKTAGISRSSFYTHYKDKYEMIDQY 60 

MVN R+E TK+AI AM+ LLK +SFD+I+T+ L+K AGISRSSFYTHYKDKYEMID Y 
Sbjct: 1 MVN--RKENTKQAILKAMVMLLKTESFDDITTA7KLSKRAGISRSSFYTHYKDKYEMIDYY 5B 

Query: 61 QQSLFNKVEYIFDRNQFKKEDALLEIFQFLDRESLFAALLTQNGTKEIQTYIIiNKLQLML 120 

QQ+ F+K+EYIF++ KE A LE+F+FL RE I> ++LL+ NGTKEIQ +I+NK++L++ 
Sbjct: 59 QQTFFHKLEYIFEKKYQNKEQAFLEVFEFLQREQLLSSLLSANGTKEIQAFIINKVRLLI 118 

Query: 121 SKELPVVNPDATKSDINRLYySVYLSHAIFGVYQMVJITRGKKESPQQITQVLLSLLPQT 179 

+ +L S + Y S+YL+HA FGV Q WI +GKKESPQ+4TQ +L +D T 

Sbjct: 119 TTDLQDKFSTEELSQTEKEYQSIYLAHAFFGVCQSVIIAKGKKESPQEMTQFVLKMLTST 177 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1043 

A DNA sequence (GBSxlll5) was identified in S.agalactiae <SEQ ID 3227> which encodes the amino 
acid sequence <SEQ ID 3228>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
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Likelihood =-10 
Likelihood 
Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 



N-terminal signal sequence 



Transmembrane 790 - 806 

32 Transmembrane 707 - 723 

11 Transmembrane 637 - 653 

32 Transmembrane 678 - 694 

44 Transmembrane 55 - 71 

22 Transmembrane 732 - 748 



787 - 808 



Final Results 

bacterial membrane Certainty=0. 5140 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10287> which encodes amino acid sequence <SEQ ID 
10288> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12856 GB:Z99109 alternate gene name: yixE-similar to phage 
infection protein [Bacillus subtilis] 
Identities = 227/783 (28%), Positives = 387/783 (48%), Gaps = 60/783 (7%) 

Query: 45 KAIIKSPKLWITMAGVALIPTLYNVIFLSSMWDPYGNTKNLPVAVVNQDKSAKLNGKTIS 104 

K 1+ S KL I + + +P +Y+ +FL + WDPYG LPV WNQDK A G+ + 
Sbjct: 9 KDIVTSKKLLIPIIAILFVPLIYSGVFLKAYWDPYGTVDQLPVWVNQDKGATYEGEKLQ 68 

Query: 105 IGKDMEDNLSKNDSLDFHFTT-AKl^KELEKGHYYMVITFPKDLSRKATTLMTEKPERL 163 

IG D+ L N++ D+HF+ ++ K+L YY+V+ P+D S+ A+T++ + P++L 
Sbjct: 69 IGDDLVKELKDNNNFDWHFSNDLDQSLKDLLNQKYYLVWIPEDFSKNASTVLDKNPKKL 128 

Query: 164 NITYKTTKGRSFVASKMSETAANKLKDEVAESITGTYTESVFKNMGSMKTGINKAADGSQ 223 

++ Y T G ++V + + E A +KLK V++ +T YT+ +F N + G++ A+ G++ 
Sbjct: 129 DLICYHTNAGSNYVGATIGEKAIDECLKASVSKEVTEQYTICVIFDNFKDIAICGLSDASSGAK 188 

Query: 224 ELLNGSNKLQDGSQTLTSNLDVIASSSQTFSGGANKLNSGINLYTDGVGTLSNGLETLSD 283 

++ +G+ ++GS L NL L S+ T S +L G T G+ +L + L D 
Sbjct: 189 KIDDGTKIlAKNGSAQLKENIiAKLKESTATISDKTAQLADGAAQVTSGIQSLDSSLGKFQD 248 



Query: 337 LQNLSDG- -LKNIjNQIITNLQSTATTDSDTNSKLFNFLSTIESSTKALMNTAAADKQKQM 394 

+ + + L L + NL+ + T + +L +F +++++ +A N + + 
Sbjct: 309 AEKIINALDLTKLETAVNNLEKSETAIV1KEFKKQLTDFENSLKNRDQAFKN--VINSSDFL 366 

Query: 395 TAVQST SAFKSLTPEQQSQITSAVTGTPTSAE-TIAANISSNIENMKTVLSEASSS 449 

TA Q + S K L ++ PT+ + A I S++E++K +++ + 

Sbjct: 367 TAEQKSQLINSVEKKLPQVDAPDFDQILSQLPTADQLPDIATIKSSLEDVKAQVAQVKAM 426 

Query: 450 APSN NGSQNLQTLSGTANNLVLKAI SDLDKI QKLPTATKQLYQGSQTLTKGITDYT 505 

+ NG++ +Q D I +L ++Y GSQ LT G T T 

Sbjct: 427 PEATSKLYNGAKTIQ DAIDRLTEGADKIYNGSQKLTDGQTKLT 469 

Query: 506 NAVGQLRKGAVTLDSKSNQLISGTQKASQGAQTIiDSKSDQLRDGAGQLASGSDRIADGSN 565 

+G+ K + S QL++G S Q+ G +L GS ++ GS+ 

Sbjct: 470 AGIGEYNKQFAKAKAGSEQLVTG SSQVSGGLFKLLDGSKQVQSGSS 515 

Query: 566 KIAGGGHQLTDGLTELSGGVSQLSSSLGKAGDQLSMVSVNKDNANAVSSPVTIKHEDYDS 625 

KLA G L GL +L G +LSS LADQ + + +PVK+ S 

Sbjct: 516 KLADGSASLDTGLGKLLDGTGELS S KLKDAADQTGD I DADDQTYGMFADPVKTKDDAIHS 575 

Query: 626 VDTNGVGmPYMISVAIMWALSANVIFAKALSGKEPANRFSWAKNK- - -LLINGFIATL 682 

V G G+ PY++S+ L V + V+F + P N F W +K 4++ G I +L 
Sbjct: 576 VPNYGTGLTPYILSMGLWGGIMLTVVFPLKEASGRPRNGFEWFFSKFNVMMLVGIIQSL 635 

Query: 683 -AATILFFAVQFIGLKPDYPGKTYFIILLTAVWLMALVTALVGWDNRYGSFLSLLILLFQ 741 
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AT+L IGL+ + + Y ++T+ +A++ h G F++++IL4 Q 

Sbjct: 636 IVATVLLLG IGLEVESTWRFYVFTIITSLAFLAIIQFIiA'rTMGNEGRFIAVIIIjVLQ 692 

Query: 742 LGSSAGTYPIELSPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQWRMLVIFLVSSMIL 801 

LG+S GT+P+EL P F+Q I LPMTYS++G R IS GD + W+M + + ++++ 
Sbjct: 693 LGASGGTFPLELLPNFYQVIHGALPMTYSINGFRAVIS-NGDFGYMWQMAGVIjIGIALVM 751 

Query: 802 ALL 804 
L 

Sbjct: 752 IAL 754 

A related DNA sequence was identified in S.pyogenes <SEQ ID 201 7> which encodes the amino acid 
sequence <SEQ ID 201 8>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -2 

INTEGRAL Likelihood = -2 



Transmembrane 735 - 751 ( 729 - 

79 Transmembrane 582 - 598 ( 580 - 6011 

66 Transmembrane 652 - 668 ( 650 - 

97 Transmembrane 14 - 30 ( 14 - 

66 Transmembrane 523 - 639 ( 622 - 



Final Results 

bacterial membrane Certainty=0. 4715 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 360/779 (46%), Positives. = 508/779 (64%), Gaps = 32/779 (4%) 

Query: 40 MLDELKAIIKSPIOliWIT^GVALIPTLYOTIFLSSMWDPYGNTKNLPVAVVNQDKSAKLN 99 

ML+ELK +IK+PKL ITM GVAL+P LYN+ FL SMWDPYG +LP+AWN DK AK 
Sbjct: 1 MLEELKTLIKNPKLMITMIGVALVPALYNLSFLGSMWDPYGRVNDLPIAVVNHDKPAKRA 60 

Query: 100 GKTISIGKDMEDNLSKNDSLDFHFTTAKRAEKELEKGHYYMVITFPKDLSRKATTLMTEK 159 

K+++IG DM D +SK+ L++HF +AK+A++ L++G YYMVIT P+DLS++A TL+ + 
Sbjct: 61 DKSLTIGNDMVDKMSKSKDLEYHWSAKQAQEGLKEGDYYMVITLPEDLSQRAATLLNPE 120 

P++L I Y+T+KG VA+KM ETA KLK+ V+4+IT TYT +VF +M +++G+ +A+ 
Sbjct: 121 PQKLTIRYQTSKGHGMVAAKMGETAMAKLKESVSQNITKTYTSAVFSSMTDLQSGLKEAS 180 



Query: 220 t 

GSQ L +G+ Q GSQTL+4NL L +SQ F G +L SG+ YTDGV + NGL 
Sbjct: 181 AGSQA1ASGAKTAQAGSQTLSTNLAALTGASQQFQQGTGRLTSGLTTYTDGVNQVKNGLG 240 

Query: 280 TLSDGVTAYTTGVHKLSEGSQKLDDKSQALVEGSEKLTDGLQQLSQATQLKPEQERTLQN 339 

TLS + Y GV +LS+G+ +L+ GL QL+QAT L E+ + +Q+ 

Sbjct: 241 TLSTDIPNYLNGVSRLSQGASQLNQ GLSQLTQATTLSDEKAKGIQS 285 

Query: 340 LSDGLKNLNQI ITNLQSTATTDSDTN SKLFNFLSTIESSTKALMNTAAADKQKQMTA 396 

L GL LNQ IL++T N +LNLI+K++ A + ++++A 
Sbjct: 287 LIVGLPVLNC^ICK3LNTELSTLQPPNLNADEIiGNSI^3AIAQftftKQVIAEETAAQNEELSA 346 

Query: 397 VQSTSAFKSLTPEQQSQITSAVTGTPTSAETIAAN-ISSNIENMICTVLSEASSSAPSNNG 455 

+Q+TS ++SLT EQQ ++ +A++ + S AA I S+++ + T L S S 
Sbjct: 347 LQATSVYQSLTAEQQGELAAALSQSDKSQTVSAAQTILSSVQTLSTSLQSLSQEDQSKQL 406 

Query: 456 SQNLQTLSGTANNLVLKAISDLDKIQKLETATKQLYQGSQTLTKGITDYTNAV GQL 511 

Q + ++ AN Q LP A+ L + S L K V QL 

Sbjct: 407 EQLKEAVAQIANQ SNQALEGASSALTELSTGLAKVNGSLNQQVLPGSNQL 456 

Query: 512 RKBAVTLDSKSNQLISGTQKASQGAQ/TLDSKSDQLRDGAGQLASGSDRIADGSNKLAGGG 571 

G L+ + + SG K S+GA L SKS +h DG+ QL+ G+ ++ADGS++L+ GG 
Sbjct: 457 TTGLAQLNRYNTAIGSGVIKLSEGANALSSKSGELLDGSHQLSEGATKLADGSSQLSQGG 516 
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Query: 572 HQLTDGLTELSGGVSQLSSSLGKAGDQLSMVS\7NKDHANAVSSPVTIKHEDTOSVDTNGV 631 

HQLT GLTELS G+S L+ SL KA QLS+VSV NA AV+ P+ + +D D V TNG+ 
Sbjct: 517 HQLTSGLTELSTGLSTIjNGSLAKASQQLSLVSVTDKNAKAVAKPLVIiNEKDKDGVKTNGI 576 

Query: 632 GMAPYMI SVALMWALSANVI FAKALSGKEPANRFSWAKNK1LINGFIATLAATILFFAV 691 

GMAPYMI + V+LMWALS NVIFA +LSG+ +++ WAK K +INGFI+T+ + +L+ A+ 
Sbjct: 577 GI^PyMIAVSL^WVALSTOTIFANSLSGRPVKX>KM)WAKQKFVINGFISTMGSIVIlYIIAI 636 

Query: 692 QFIGLKPDYPGKTYFIILLTAWTLMALVTALVGWDNRYGSFLSLLILLFQLGSSAGTYPI 751 

Q +G + Y +T I+L+ WT MALVTALVGWD+RYGSF SL++LL Q+GSS G+YPI 
Sbjct: 637 QLLGFEARYGMETLGFIMLSGWTFMALVTALVGWDDRYGSFASLVMLLLQVGSSGGSYPI 696 

Query: 752 ELSPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQKRMLVIFLVSSMILALLIYRKQE 810 

ELS FFQ + PFLPMTY VSGLR+TISL+G + + ++L FL++ M+LALLIYR ++ 
Sbjct: 697 ELSGAFFQKLHPFLPMTYWSGLRQTISLSGHIGVEVKVLTGFLIAFMVLALLIYRPKK 755 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1044 

A DNA sequence (GBSxlll6) was identified in S.agalactiae <SEQ ID 3229> which encodes the amino 
acid sequence <SEQ ID 323 0>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2664 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1045 

A DNA sequence (GBSxlll7) was identified in S.agalactiae <SEQ ID 3231> which encodes the amino 
acid sequence <SEQ ID 3232>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.45 Transmembrane 48 - 64 ( 45 - 69) 
INTEGRAL Likelihood = -1.49 Transmembrane 71 - 87 ( 71 - 87) 

Final Results 

bacterial membrane Certainty=0 .4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9441> which encodes amino acid sequence <SEQ ID 9442> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25222 GB:MB7483 ORF 1 [Lactococcus lactis] 
Identities = 50/88 (56%), Positives = 66/88 (74%), Gaps = 1/88 (1%) 

Query: 2 TGKIFSMSKEELSYLPVIKLFKNCGVYNGLIGLFLLYGLYISQNQ-EIVAVFLINVLLVA 60 
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Query: 61 IYGALTVDKKILLKQGGLPILMiLTFLF 88 
5 +YG+LT +KKI+L QGGLi ILAL++ F 

Sbjct: 92 LYGSLTSNKKIILTQGGLAILALISSFF 119 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 871 9> and protein <SEQ ID 8720> were also identified. Analysis of this 
10 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 4.19 
GvH: Signal Score (-7.5): -3.99 
Possible site: 38 
15 »> Seems to have an uncleavable N-term signal seq 

ALOM program count: 3 value: -9.45 threshold: 0.0 

INTEGRAL Likelihood = -9.45 Transmembrane 87 - 103 ( 84 - 108) 
INTEGRAL Likelihood = -1.49 Transmembrane 110 - 126 ( 110 - 126) 
INTEGRAL Likelihood = -0.37 Transmembrane 13 - 29 ( 13 - 29) 
20 PERIPHERAL Likelihood = 0.47 65 

modified ALOM score: 2.39 

*** Reasoning Step: 3 

25 Final Results 

bacterial membrane Certainty=0 .4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=o . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

ORF00610(328 - 681 of 981) 

SP|Q02009|YTRP_LACLA(1 - 119 of 119) HYPOTHETICAL 13.3 KDA PROTEIN IN TRPE 5'REGION. 
GP | 551879 jgb | AAA25222. l| |M87483 ORF 1 {Lactococcus lactis} PIR| S35123 1 S35123 hypothetical 
protein (trpE 5' region) - Lactococcus lactis subsp. lactis 
35 %Match =19.9 

%Identity =58.8 %Similarity =77.3 

Matches = 70 Mismatches = 26 Conservative Sub.s = 22 

114 144 174 204 234 264 294 324 

40 SPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQWRMLVIFLVSSMILALLIYRKQED**KVSSDRLTV*YGMSKYLGGE 

354 384 414 444 474 504 534 561 

DMSTLTIIIATLTALEHFYIMYLETLATQSNMTGKIFSM3KEELSYLPVIKLFKNQGVYNGLIGLFLLYGLYISQNQ-EI 

I.- mi- mm i num.- h i i -mi mi i immiiiiiii m == i mi 

45 MTILTIILSLLVALEFFYIMYLETFATSSKTTSRVFNMGKEELERSSVQTLFKNQGIYNGLIGLGLIYAIFFSSAQLEI 



VAVFLINVLLVAIYGALTVDKKILLKQGGLPILALLTFLF*YYLAVRFS*TAFSNHFFLIIQW*VICL*K*YNITTNSK 
I : = ll mihlMI :||| = l Mil ||||:: =1 
VRLLLIYI ILVALYGSLTSNKKI ILTQGGLAILALISSFF 
90 100 110 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1046 

A DNA sequence (GBSxlll8) was identified in S.agalactiae <SEQ ID 3233> which encodes the amino 
acid sequence <SEQ ID 3234>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
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> Seems to have : 



3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3140 (Affirmative) < suco 

5 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10285> which encodes amino acid sequence <SEQ ID 
10286> was also identified. 

10 The protein has homology with the following sequences in the GENPEPT database. 



20 



Query: 13 KDGSDIYYRWGQGQPIVFIJ1GNSLSSRYFDKQIAYFSKYYQVIVMDSRGHGKSHAKLNT 72 

+D + +YY G G PI+F+HG +S ++F KQ + S YQ I +D RGHG+S L+ 
Sbjct: 7 EDQTRLYYETHGSGTPILFIHGVLMSGQFFHKQFSVLSANYQCIRLDLRGHGKSDKVLHG 66 

Query: 73 I S FRQIAVDLKDI LVHLEIDKVI LVGHSDGA 103 

+ Q A D+++ L +E+D V+L G S GA 
Sbjct: 67 HTISQYARDIREFLNAMELDHWLAGWSMGA 97 



No corresponding DNA sequence was identified in S.pyogems. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1047 

A DNA sequence (GBSxlll9) was identified in S.agalactiae <SEQ ID 3235> which encodes the amino 
acid sequence <SEQ ID 3236>. This protein is predicted to be an integral membrane protein. Analysis of 
this protein sequence reveals the following: 



30 



Possible site: 58 



> Seems to have an uncleavable N 


term signal seq 










INTEGRAL 


Likelihood =- 


12 


90 


Transmembrane 


14 


30 


9 


41 


INTEGRAL 


Likelihood = 


-9 


71 


Transmembrane 


451 


467 


447 


472 


INTEGRAL 


Likelihood = 


-9 


18 


Transmembrane 


234 


250 


229 


257 


INTEGRAL 


Likelihood = 


-8 


07 


Transmembrane 


56 


72 


46 


- 77 


INTEGRAL 


Likelihood = 


-8 


01 


Transmembrane 


490 


506 


4B4 


512 


INTEGRAL 


Likelihood = 


-5 


84 


Transmembrane 




430 


412 


436 


INTEGRAL 


Likelihood = 




99 


Transmembrane 


136 


152 


135 


159 


INTEGRAL 


Likelihood = 


-4 




Transmembrane 


213 


229 


211 


232 


INTEGRAL 


Likelihood = 




14 


Transmembrane 


365 


381 


364 


382 


INTEGRAL 


Likelihood = 


-2 


ee 


Transmembrane 


393 


409 


391 


- 412 


INTEGRAL 


Likelihood = 


-1 


06 




168 


184 


167 


184 


INTEGRAL 


Likelihood = 


-0 


64 


Transmembrane 


275 


291 


275 


291 


INTEGRAL 


Likelihood = 


-0 


32 


Transmembrane 


328 


344 


328 


345 


INTEGRAL 


Likelihood = 


-0 


27 


Transmembrane 


821 


837 


821 


837 



- Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 6158 (Affirmative) . 

Certainty=0. 0000 (Not Clear) < : 
-- Certainty=0. 0000 (Not Clear) < ; 



A related GBS nucleic acid sequence <SEQ ID 10283> which encodes amino acid sequence <SEQ ID 
10284> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA24464 GB:D85082 YfiX [Bacillus subtilis] 
Identities = 190/596 (31%), Positives = 324/596 (53%), Gaps = 31/596 (5%) 
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Query: 246 IVSLIPGGLGSFELVLFTGFAAEGLPKETWAWLLLYRI^YYIIPFFAGIYFFIHYLGSQ 305 

++SL+PGG GSF+L+ G G +E +V ++LYRLAY IPF G++F L 
Sbjct: 1 MISLVPGGFGSFDLLFLLGMEQLGYHQEAIVTSIVLYRIAYSFIPFILGLFFAAGDLTEN 60 

Query: 306 INQRYENVPK ELVSTVLQTMVSHLMRILG AFLIFSTAFFENITYIMWLQKLG 357 

+R E P+ E 4 +L + L+RIL + ++F + + + +L 

Sbjct: 61 TMKRLETNPRIAPAIETTNVLLWQRAVLVRILQGSLSLIVFVAGLIVIASVSLPIDRLT 120 

Query: 358 LDP-LQEQMLWQFPGLLLGVCFILIiARTID--QKOTKNAFPlAIIWITLTLFYLNLGHISW 414 

+ P + L F GL L ILL 1+ ++ K ++ +AI + + L ++ 

Sbjct: 121 VIPHIPRPALLLFNGLSLSSALILLILPIELYKRTKRSYTMAITALVGGFVFSFLKGLNI 180 

Query: 415 RLSFWFILLLLGLLVIKFTLYKKQFIYSWEERIKDGIIIVSLMGVLFY IAGLLFPI 470 

F ++++ L+++K ++Q Y+ + I V+L V + IAG ++ 

Sbjct: 181 SAI FVLPMI I VLLVLLKKQFVREQASYTLGQLI FAVALFTVALFNYNLIAGFIWDR 236 

Query: 471 RAHITGGSIERLHYIIAWEPIALATL ILTLVYLCLVKILQGKSCQIGDVFNVDRYK 526 

+ + +++ + I AT+ 1+ L +L + ++ IG+ + +R 
Sbjct: 237 MKKV LRHEYFVHSTSHITHATIMAIIIVPLFFLIFTWYHKRTKPIGEKADPERLA 292 

Query: 527 KLLQAYGGSSDSGLAFLNDKRLYWYQKNGEDO/AFQFVIVMNKCLIMGEPAGDDTYIREA 586 

L GG++ S L FL DKR Y + +G + F + + +++G+P+G 
Sbjct: 293 AFLNEKGGNALSHLGFLGDKRFY-FSSDGNALLLFGKIA- -RRLWLGDPSGQRESFPLV 349 

Query: 587 I ES F I DDADKLDYDLVFYS IGQKLTLLLHE YGFDFMKVGEDALVNLETFTLKGNKYKPFR 646 

+E F+++A + + ++FY I ++ L H++G++F K+GE+A V+L TFTL G K R 
Sbjct: 350 LEEFIJNEAHQKGFSVLFYQIEREDMALYHDFGYNFFKLGEEAYVDLNTFTLTGKBCKAGLR 409 

Query: 

Sbjct: 

Sbjct: 

Sbjct: 



767 DLGMAPLSGVGRVETSFAKERMAYLVYHFGSHFYSFKGLHKYKKKFTPLWSERYIS 822 

++GMAPL+ VG TSF ER A ++++ + YSF+GL +K+K+ P W +Y++ 
52 9 mGMAPDAOTGTAFTSFWSERFAAVIFKlWRYMYSFSGLRAFKEKYKPEWRGKYLA 584 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8721> and protein <SEQ ID 8722> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site 
McG: Discrim Score: 
GvH: Signal Score 
Possible sitf 
>>> Seems to have 
ALOM program 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 



5): -7.6 



E8 



INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



uncleavable N-t 
count: 14 value: 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood ■ 
Likelihood = 
Likelihood = 
Likelihood . 
Likelihood = 
Likelihood . 
Likelihood = 
Likelihood • 
Likelihood > 
Likelihood i 
Likelihood ■ 



PERIPHERAL Likelihood = 1. 



erm signal seq 
90 threshold: C 
Transmembrane 
Transmembrane 4 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



14 - 30 ( 9 - 

451 - 467 ( 447 - 

234 - 250 ( 229 - 

56 - 72 ( 46 - 

490 - 506 ( 484 - 

414 - 430 ( 412 - 

136 - 152 [ 135 - 

213 - 229 ( 211 - 

365 - 381 ( 364 - 

393 - 409 ( 391 • 

168 - 184 ( 167 - 

275 - 291 ( 275 • 



Transmembrane 



512) 
436) 
159) 
232) 
382) 
412) 
184) 
291) 
345) 
837) 
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modified ALOM score: 3.08 

*** Reasoning Step: 3 

5 Final Results 

bacterial membrane Certainty=0 . 6158 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

ORF00608(967 - 2787 of 3141) 

OMNl|NT01BS0989 (20 - 633 of 652) putative integral membrane protein, putative 
%Match =14.6 

%Identity =33.0 %Similarity =58.0 
15 Matches = 201 Mismatches = 244 Conservative Sub.s = 153 

825 855 885 915 945 975 1005 1035 

YYLVLIGASmFPVIYWISGHKGSHYFGDMPSSTRIKLGWSFFEWGCSiAi^FIIIGYLMGIHLPVYKILPLFCIGffiVG 

: III! : :: :|| | 

20 LELQLtNGSWPGPVIYFALFAMGIHADIRYVFGVFVIAAIGG 

10 20 30 40 

1065 1095 1125 1155 1185 1215 1245 1260 

IVSLIPGGLGSFELVLFTGFAAEGLPECETWAWLLLYRLAYYIIPFFAGIYFFIHYLGSQINQRYENVPK ELVST 

25 ::||:|||:|||:|::: | | :| :| ::|||||| ||| |::| | :|| 1= I = 

MISLVPGGFGSFDLLFLLG^IEQLGYHQFAIVTSIVLYRrAYSFIPFII^LFFAAGDLTEOTMKSLETNPRIAPAIET'IW 
60 70 80 90 100 110 120 

1290 1311 1341 1371 1398 1428 1458 1482 

30 VXiQTMVSHLMRIL-GAF^"LIFSTAFFENlTYir^LQKLGLDP"LQEQMLWQFPGLLLGVCFILLARTID--QKVKNAFP 
:| : 1 = 111 l = = ::| = : : : :| : i : I I II I =111 1= = = I :» 
LLWQRAVLWILQGSLSLIVFVAGLIVLASVSLPlDRLTVIPHIPRPALL^FNGLSLSSAIilLLILPIELYKRTKRSYT 
140 150 160 170 180 190 200 

35 1512 1542 1572 1602 1632 1659 1689 1719 

IAIIWITLTLFYMLGHISWRLSFWFILLIjLGLLVIKPTLYKKQFIYSWEE-RIKDGIIIVSLMGVLFYIAGLLFPIRAH 
:|| : : , | :: :| |:| :: :|:: | ||||: : | |..| | .: |. 

MAITALVGGFVFSFLKGLN- -ISAIFVLPMIIVLLV LLKKQFVREQASYTLGQLIFAVALFTVALFNYNLIAGFIWD 

220 230 240 250 260 270 

40 

1749 1779 1797 1827 1857 1887 1917 1947 

ITGGSIERLHYIIAWEPIALAT LILTLVYLCLVKILC^KSCQIGDVFJ^VDRYKKLLQAYGGSSDSGIAFLNDKRIiY 

: ::: : | || :|: | :| : : : :: ||: = =1 = 1 II- I I II 111 = 1 

RMKKVLRHEYFraSTSHITHATI^IIIVPLFFLIFnWHKRTKPIGEKADPERLAAFLNEKGGNALSHLGFLGDKRFY 
45 290 300 310 320 330 340 350 

1977 2007 2037 2067 2097 2127 2157 2187 

WYQKNGEDCWAFQFVIVNNKCLIMGEPAGDDTYIREAIESFIDDADKLDYDLVFYSIGQKLTLtiLHEYGFDFMKVGEDAL 
= s| = I = = :::|:|:| =1 l = = = l = = = = ll I :: I l==l==l hll = l 

50 -FSSDGNALLLF--GKIAPJILvVLGDPSGQRESFPLVLEEFI^ 1 EAHQKGFSVLFYQIEREDMALYHDFGYNFFKIjGEEAY 
370 380 390 400 410 420 430 

2217 2247 2277 2307 2337 2367 2397 2427 

VNLETFTLKGNKYKPFRNALNRWKDGFYFEWQSPHSQELIJSISL 
55 |:| Mil || :| II I:: : I I : I I =1 l = :|| = II = |||||||:|: hhllll =1 

VDIJSTTFTLTGKKKAGLRAIIMRFEREEYTFHOTHPPFSDAFLEELKQISDE^GSKKEKGFSLGFFDPSYLQKAPIAYMK 
450 460 470 480 490 500 510 

2457 2487 2517 2547 2577 2607 2637 2667 

60 NAEHEWAFANIMPNYEKS1ISIDLMRHDKQKIPWGVMDFLFLSLFSYYQEKGYHYFDLGMAPLSGVGRVETSFAKERMA 
. Ill 1 = 1 II I 1 = 1 I l = = 11 = 1111= = 111 = 11 ll> =1 = =1 = 1 l = = lllll= II III III 
NAEGEIVAFANVMPMYQEGEISVDIjMRY-RGDAPNGIMDALFIFJtfFLWAKEE 

530 540 550 560 570 580 590 

65 2697 2727 2757 2787 2817 2847 2877 2907 

YLVYHFGSHFYSFNGLHKYKIOCFTPLWSERYISCSRSSWLieAICALL^ 
' ==== ' 111=11 =1=1= I I =l== == I 
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AVIFMlTO'H^SFSGLRAFKEKyKPEWRGKXI^.YRKlTRSIiSVTMFLVTRLIGKSKKDSV 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 



Example 1048 . > 

A DNA sequence (GBSxll20) was identified in S.agalactiae <SEQ ID 3237> which encodes the amino 
acid sequence <SEQ ID 3238>. This protein is predicted to be choline transporter. Analysis of this protein 
sequence reveals the following: 



Transmembrane 



Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



28 - 44 ( 22 - 

178 - 194 ( 175 - 2041 

81 - 97 ( 63 - 105! 

209 - 225 ( 20S - 226! 



■ Final Results 

bacterial 
bacterial outside 



-- Certainty=0. 5097 (Affirmative) • 
- Certainty=0. 0000 (Not Clear) , 



bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45530 GB:AF162656 choline transporter [Streptococcus pneumoniae] 
Identities = 326/505 (64%) , Positives = 409/505 (80%) , Gaps = 1/505 (0%) 

MTTLITTFQERFGDWTQSLIEHLQLSLLTLILATLIAIPLGIIISHYKKISHVVIjQITGI 60 
MT LI TFQ+RF DW +L +HLQLSLLTL+LA L+AIPL + + +++K++ VLQI GI 
MTTOjIATFQDRFSDWLTALSQHLQLSLLTLLLAILLAIPLAVFLRYHEKLADWVLQIAGI 6 0 



FQTIPSLALLGLFIP MGIGT+PA+ AL+IYA+FPILQNT+T L ID ML EA AFGM 



TRWERLKKFE+ L+MPVI+SGIRTA+V+IIGTATLA+LIGAGGLGSFILLGIDRNN SLI 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 . 


Query: 


121 


Sbjct : 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


420 


Sbjct: 


421 


Query: 


480 



LIGA+SSAVLAI F+ L+ ++EKA+LRTI 



5 EP+IL NMYK LIE+ T 4 



-FLY+ALK GDID+YPEFTGT+T S 



LL+ PKVS+ P+QVY +A++GI KQD L+ L PM+YQNTYAVAV K A+ LK ISD 



LKK++ +LKAGFTLEF DREDG+ GLQ YGLNL+++T+EPALRYQAI S D+ I DAYS 



TD+EL +Y LQ+L+DDK LFPPYQGAPL+++ +KK+P++-++ LN LAG ITE +M ++N 



Query: 480 YQVAVKHKSAATVAKQYLKAHHIIK 504 
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There is also homology to SEQ ID 636. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1049 

A DNA sequence (GBSxll21) was identified in S.agalactiae <SEQ ID 3239> which encodes the amino 
acid sequence <SEQ ID 3240>. This protein is predicted to be choline transporter (opuBA). Analysis of this 
protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2345 (Affirmative) < succ: 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MISFENVSKSYGDHTIIDNISCHIQRGEFFVLVGASGSGKTTILKMINRLIEPSQGAITL 60 
MI ++NV+ Y + ++ +++ 1+ GEF VLVG SGSGKTT+LKMINRL+EP+ G I + 
25 Sbjct: 1 MIEYKNVALRYTEKDVLRDVNLQIEDGEFMVLVGPSGSGKTTMLKMINRLLEPTDGNIYM SO 

Query: SI DGENITSLDLRQLRLETGYVLQQIALFPNLTVGENIELIPEMKGWSKGDQKKAASDLIiDK 120 

DG+ I D R+LRL TGYVLQ IALFPNLTV ENI LIPEMKGWSK + K +LL K 
Sbjct: SI DGKRIKDYDERELRLSTGYVLQAIALFPNLTVAENIALIPEMKGWSKEEITKKTEELLAK 120 

30 

Query: 121 VGLPAKDYFNRYPHELSGGEQQRIGILRAIVAKPKVLLMDEPFSALDPISRRQLQDITKQ 180 

VGLP +Y +R P ELSGGEQQR+GI+RA++ +PK+ LMDEPFSALD ISR+QLQ +TK+ 
Sbjct: 121 VGLPVAEYGHRLPSELSGGEQQRVGIVRAMIGQPKIFLMDEPFSALDAISRKQLQVLTKE 180 

35 Query: 181 LQSELGITLVFVTHDMKEAMRIiADRICVIKEGKIVQLDRPEIIQNNPSDQFvRTLF 236 

L E G+T +FVTHD EA++LADRI V+++G+I Q+ PE I P+ FV LF 

There is also homology to SEQ ID 644. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1050 

A DNA sequence (GBSxll22) was identified in S.agalactiae <SEQ ID 3241> which encodes the amino 
acid sequence <SEQ ID 3242>. This protein is predicted to be two-component response regulator. Analysis 
45 of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.52 Transmembrane 49 - 65 ( 46 - 66) 

50 Final Results 

bacterial membrane Certalnty=0 . 3208 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06434 GB:APO01516 two- component response regulator [Bacillus halodurans) 
Identities = 101/305 (33%) , Positives = 152/305 (49%) , Gaps = 31/305 (10%) 

Query: 1 MKFYIIDDDPTITMILQDIIE-EDFHHTVVRVNOTSSKAYIffiLLIADVDIVljIDLLMPIIi 59 

M F+I DDD T+ IL HE B V + S LI VDI+LIDLLMP 

Sbjct: 1 MNFFITDDDVTVRSI1AQIIEDEQLGQWGEAEDGSELDGKRLNIKQVDILLIDLLMPNC SO 

Query: 60 DGVTLVQKIYKQRSDLKFIMISQVKDOTLRQFAYKAGIEFFINKPINIIEVKSWKRVTD 119 

DG+ +QKI K K IMISQ++ +Ii EAY GIE +1 KPIN IEV SV+++V + 

Sbjct: 61 DGLEAIQKI-KPEFKGKIIMISQIESKELISEAYLLGIEHYIMKPINKIEVLSVIRKVIN 119 

Query: 120 TIEMQKKLNTIQNLLENTPSYQKPITTSNLT KIRS ILSYLGITSETAYTDIL 171 

+++ L IQ L N P ++ I+S +LS LGI E+ D++ 

Sbjct: 120 HTRLEQSLYDIQKSLSNVLQGSIPTQVHDQVFHDDSIKSYGQYLLSELGIAGESGSKDLM 179 

Query: 172 NICELLLKQELNF AQFDFQKELSIDE HQQKI ILQRIRRAVKK 213 

NI Ii E + AD ++L+ ++ + K QR+RRAV + 

Sbjct: 180 NILMFLYTYEKEYSFEKGFPALKDIFEQmSEKLGDAADERDVRREVKAAKQRVRRAVYQ 239 

Query: 214 AMINMAHLYIDDFENELTLQYANALFGFQHIHNEAQLIQGK SMYGGKISLKHFFDEL 270 

++ ++A L + DF N +YA+ F F + ++ ++ + S +I-H-K F L 
Sbjct: 240 SLEHVASLGLIDFSNPKFEEYASHFFDFSVTOSKMTELKNETSSSYTSARINVKKFTQAL 299 

Query: 271 ILQSK 275 

Sbjct: 300 YYEAK 304 



30 There is homology to SEQ ID 460. 

A related GBS gene <SEQ ID 8723> and protein <SEQ ID 8724> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -7.05 
35 GvH: Signal Score (-7.5): -6.58 

Possible site: 61 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -5.52 threshold: 0.0 

INTEGRAL Likelihood = -5.52 Transmembrane 49 - 65 ( 46 - 66) 
40 PERIPHERAL Likelihood = 7.37 155 

modified ALOM score: 1.60 

*** Reasoning Step: 3 

45 Final Results 

bacterial membrane — Certainty=0 . 3208 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



50 The protein has homology with the following sequences in the databases: 

ORF00604(307 - 1125 of 1431) 

EGAD ( 137180 [146289 (3 - 304 of 310) hypothetical protein {Bacillus cereus} 
GP|l769946|emb|CAA67094.l| |X98455 orfl {Bacillus cereus} 
%Match =12.7 
55 %Identity =34.1 %Similarity =53.0 

Matches = 95 Mismatches = 123 Conservative Sub.s = 53 



*C*W*YLSRNRAIPRAYFNGRAISRNDNCLS*SAKWOTIYOTIP*KSI*V}^*YWFyiIDDDPTITMILQDIIEK-DFlJ 

=11=111 =1 III: U 

MFYYIVDDDEVFRSMLSQIIEDGDLG 
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405 435 465 495 525 555 585 615 

NTVVRVNNVSSKAYNELLIADVDIVLIDLLMPI LDGVTLVQK _ F 1ISQVKDMDLRQEAYKAGIEFFINKPI 

: : : =1 11= 1= I I 111111' I III WU = I lb 

EVIGESEDGAFVEAEQUSIYKKOTILFIDLLMPMRDGIETVRHI-ASSFTGKIIMISQVESKQIjIGEAYTLGVEYYITKPL 
5 40 50 60 70 80 90 100 



NIIEVKSWKRVTDTIEMQKKLKTIQNLLENTPSYQKP ITTSNLTKI RSILSYLGITSETAYTDILNICELL 

I III : | ::: : || I | ::|| I II I s|: III I |:|:= I I 

NKIEWSVTOK^IERIRLERSIYDIQKSLMWFQIffiKPQMRSET^.rQEEKKISDSGRFLLAELGrAGENGSKDLLSMDEYL 



861 894 924 954 984 1014 

LKQELNFAQFDFQKELSID EHQQKIILQRIRRAVKKAMINMAHLYIDDFENELTLQYANAL 

15 || |:| | | :=| 11=111= :== ==l 1=111 II = 

YGQE-KAQTFEFGFPALKDIFHQITLKKLGEIASDADIEKEKKASEQRVRRAIYQSLNHLASLGLTDFSNPKFESYAPKF 
200 210 220 230 240 250 260 

1071 1095 1125 1155 1185 1215 1245 

20 FGFQNIHNE-AQLIQGKSMYGGKISL--KHFFDELILQSKTF*DLFKHGLIYYNHPKTFLFINLQQTPCLPQGVCFCF*F 
||: :: : I : I I I = ::| 

FDFTVVRKRMTEMTKDGVATSGHIRINTKKFIQVLYFEAKRLMEIE 
280 290 300 310 

SEQ ID 8724 (GBS356) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 73 (lane 3; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 8; MW 59kDa). 

GBS356-GST was purified as shown in Figure 216, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



30 Example 1051 

A DNA sequence (GBSxll23) was identified in S.agalactiae <SEQ ID 3243> which encodes the amino 
acid sequence <SEQ ID 3244>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



: 26 

lave a cleavable 1 
Likelihood = -6 
Likelihood = -5. 
Likelihood = -2 
Likelihood = -2 
Likelihood = -0 
Likelihood = -0 



■term signal seq. 
18 Transmembrane 
:0 Transmembrane 
SO Transmembrane 



• 155 ( 147 - 

- 53 ( 29 - 

- 142 ( 126 - 

- 78 ( 60 - 

• 330 ( 314 - 

- 105 ( 89 - 



Final Results 

bacterial membrane Certainty=0 . 3590 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06435 GB:AP001516 two-component sensor histidine kinase 
[Bacillus halodurans] 
Identities = 118/427 (27%), Positives = 199/427 (45%), Gaps = 25/427 (5%) 

Query: 10 LERRQRIIISAIAIA-LAAQINISILADGFIMTLSLFILPVFLYFNDDINPILLCLGITF 68 

L + II+S + A +A +IN + + F ++L I +FL F + 1+ 
Sbjct: 7 LSKDYMIILSMLLFAPIAGEINETTPVNETFRVSLGPPIFFLFLLFLRNTAAIVPGFFTAI 66 

Query: 69 ASPIFRGI ILSIAGEAEIHQI IEFVLTDMAFYICYGITFYTIYWHRSYRNKGTFFFSI 1 1 128 

A +FR ++++ E FYY+F R+ FII 

Sbjct: 67 AWVFRVFLDTLHADFYWVDSFEIHYPTFFFYFTYSLLFSLAKVQRFHEQPLIIFLFGII 126 
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Query: 


129 


Sbjct: 


127 


Query: 


185 


Sbjct: 


185 


Query: 


241 


Sbj Ct: 


245 


Query: 


293 


Sbjct: 


303 




353 


Sbjct: 


361 






Sbj ct: 


420 



+ Y L+II+N+V NAVEAID KG +++ + L ++ 

--KGMLTTRVKALGQTVEFR 3 



I D+GPGIPDK + +IFKPGF4+KFD G 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1052 

A DNA sequence (GBSxll24) was identified in S.agalactiae <SEQ ID 3245> which encodes the amino 
acid sequence <SEQ ID 3246>. This protein is predicted to be ornithine carbamoyltransferase Otc6850 
(argF). Analysis of this protein sequence reveals the following: 
Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 171 - 187 ( 171 - 187) 



Final Results 

bacterial membrane Certainty=0 . 1256 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB75986 GB:AJ272085 ornithine carbamoyltransf erase 
[Staphylococcus aureus] 
Identities = 264/332 (79%) , Positives = 292/332 (87%) 

MKNLRNRSFLTLLDFSTAEVEFLLKLSEDLKRAKYAGIEQQKLVGKNIALIFEKDSTRTR 6 0 
MKNLRNRSFLTLLDFS EVEFLL LSEDLKRAKY G E-)- L KNIAL+FEKDSTRTR 
MKNLRNRSFLTLLDFSRQEVEFLLTLSEDLKRAKYIGTEKPMLKNKNIALLFEKDSTRTR 6 0 



CAFEVAAHDQGA+VTyLGPTGSQMGKKET+KDTARVLGGMYDGIEYRGFSQ TVETLAE4 



SGVPVWNGLTD DHPTQVLADFLTAKE L K Y DI FTYVGDGRNNVANALM GA+I+G 



M +HLVCPKEL P ELL++C+ IA G +1 IT DI +GV+ SDV+YTDVWVSMGEPD 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 
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Query: 241 EVWKERIALLEPyRITQEMLNMTEKPm'IFEHCLPSFHNIDTWGYDIYEKYGLKEMEVS 300 

EVWKER+ LL+PY++ +EM++ T NPNVIFEHCLPSFKN DTK+G I +EKYG+ +EMEV+ 
Sbjct: 241 EVWKERtELLKPYQVWKEMMDKTGNPNVlFEHCLPSFHNADTKIGQQIFEICyGIREMEVT 300 

5 Query: 301 DEVFEGPHSWFQEAENRMHTIKAVMVATLGD 332 

DEVFE SWFQEAENRMHTIiffl.VMVaTLG+ 
Sbjct: 301 DEVFESKASWFQEAENRMHTIKAVMVATLGE 332 

There is also homology to SEQ ID 3118. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1053 

A DNA sequence (GBSxll26) was identified in S.agalactiae <SEQ ID 3247> which encodes the amino 
acid sequence <SEQ ID 3248>. This protein is predicted to be carbamate kinase (b2874). Analysis of this 
15 protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood - -0.48 Transmembrane 214 - 230 ( 214 - 230) 

20 Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 





3 


Sbj Ct: 


2 




58 


Sbjct: 


62 




117 


Sbjct: 


122 


Query: 


177 


Sbjct: 


182 




236 


Sbjct: 


242 




296 


Sbjct: 


302 



FPF GA S+GYIGYHLQ ++ EL + Gl K V TI TQ+ VD++ 



v PTKPIG+FY KE +EK+ +KGYT EDAGRGYRRWASP+P 



+++ +VIA GGGGIPV+ G EG+ AVIDKD 4 



F K +QKAL E+N ++ Y+ +G+FA GSMLPKV AC F+ 



There is also homology to SEQ ID 3 1 10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 1054 

A DNA sequence (GBSxll27) was identified in S.agalactiae <SEQ ID 3249> which encodes the amino 
acid sequence <SEQ ID 3250>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal 

- Final Results 

bacterial cytoplasm Certainty=0 .3558 (Af f irmative) < 

bacterial membrane Certainty=0 . 0000 (Not Clear) < i 

bacterial outside Certainty=0. 0000 (Not Clear) < i 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1055 

A DNA sequence (GBSxll28) was identified in S.agalactiae <SEQ ID 3251> which encodes the amino 
acid sequence <SEQ ID 3252>. This protein is predicted to be a transmembrane protein (b2298). Analysis 
of this protein sequence reveals the following: 



Possible site: 35 



3 have a cleavable N-tenn signal seq. 



, i 1 -L 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood - 



Transmembrane 413 • 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



429 ( 405 - 

514 ( 489 - 

181 ( 161 - 

143 ( 122 - 

324 ( 306 - 

350 ( 330 - 

210 ( 193 - 



- 388 i 



371 - 



Transmembrane 436 



■ 266 ( 250 - 
• 484 ( 468 - 
- 452 ( 436 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0 . 6243 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < i 

• Certainty=0. 0000 (Not Clear) . < s 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22251 GB:U32741 conserved hypothetical transmembrane protein 
[Haemophilus influenzae Rd] 
Identities = 303/506 (59%), Positives = 389/506 (75%), Gaps = 6/506 (1%) 

Query: 10 NH?SKGFRMPGAFTILFILTIFSVIATmiPAGSYSKlQFDTASSKLvVTDPNGKTVHVP 69 

+K+ K F P AFTILF + I +V TW IP+GSYSKL +++ + W P 
Sbjct: 4 SKKKKTFNFPSAFTILFAILIIAVGLTWVIPSGSYSKLTYNSTDNVFWKAYGVDDKTYP 63 

Query: 70 ATQTQLDKMNVKIKIKEFTSGAISKPVSVPNTYKRLKQNPAGIGSVTTSMVNGTIEAVDI 129 

AT LD +N+KIK+ FT G I KP4-++P TY+R++Q+ GI +T SMV GTIEAVD+ 
Sbjct: 64 ATTDTLDNLNIKIKLSNFTEGVIKKPIAIPGTYQRVEQHHKGIEDITKSMVEGTIEAVDV 123 



Query: 130 1WFIMVLGGMIGWRECSGAFESGLLALTKKTKGREFLLIFLVSLLMVLGGTLCGIEEEAV 189 

MVFI VLGGMIGV+ ++G+F +GL+AL KKIKG EF ++F VS+LMVLGGT CGIEEEAV 
Sbjct: 124 IWFIFVLGX3MIGVINRTGSFNAGLMALVKKTKGNEFFIVFCVSVLMVLGGTTCGIEEEAV 183 

Query: 190 AFYPILVP I FLAMGYDS 1 1 CVGAI FLASSVGTSFSTINPFS S VIASNAAGI SFTEGLSWR 249 
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Query: 250 TAGCIAGAIFWVYLHWYAKKIKRNPEPSYSYEDRVEFHAKWGMTWI-HTPSLFTIRQKI 308 

G + GA V+ YL+WY KKIKA+P FSY+Y+DR EF ++ + +T F+ R+K+ 
Sbjct: 244 ALGLVLGATCVIAYLYWYCKKIKADPSFSYTYDDREEFRQRYMKNFDPNTTIPFSARRKL 3 03 

Query: 309 ILSLFVISFPLMVWGVMSQGWWFPTMASSFLAITIIIMFLTATGANGIGERDWDEFVNG 368 

IL+LF ISFP+M+WGVM GWWFP MA+SFLAITI I IMF+ +G+ E+D+++ F G 

Sbjct: 304 ILTLFCISFPIMIWGVWGGWWFPQMAASFLAITIIIMFI SGLSEKDIMESFTEG 358 

Query: 369 ASSLVGVSLIIGLARGINIILSQGYISDTMIiYTASKLASHVSGSVFIIVMMFIYFVLGFV 428 

AS LVGVSLIIGLARG+N++L QG ISDT+L S + S + GSVFI+ + ++ LG + 
Sbjct: 359 ASELVGVSLIIGLARGVNLVLEQGMISDTILDYMSNWSGMPGSVFILGQLWFIFLGLI 418 

Query: 429 VPSSSGI^VLSMP1IAP1ADWGIPP.SVVVMAYQFGQYAMLFLAPTGLVMATLQMLDMKY 488 

VPSSSGLAVLSMPI+APLAD+VGIPR +W AY +GQYAMLFLAPTGLV+ TLQML + + 
Sbjct: 419 TOSSSGi^VLSMPI^PIJUDSVGIPRDIWSAYM\'GQYAMLFIAPTGLVLVTLQMLQIPF 478 

Query: 489 SHWLKFVWPVVLFLLIFGGGLLVLQV 514 

W+KFV P++ LL+ G LLV+QV 
Sbjct: 4-79 DRWKFVMPMIGCLLLIGSILLWQV 504 



A related DNA sequence was identified in S. pyogenes <SEQ ID 325 3> which encodes the amino acid 
sequence <SEQ ID 3254>. Analysis of this protein sequence reveals the following: 



Possible site: 36 



>> Seems to 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have a cleavable N-term signal seq. 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 204 



• 495 ( 472 - 496), 
- 277 ( 258 - 280) 

• 169 ( 142 - 180) 

• 409 ( 391 - 411) 

• 97 ( 78 - 99) 

• 334 ( 314 - 338) 

• 368 ( 352 - 369) 

• 136 ( 119 - 138) 

• 220 ( 204 - 220) 



--- Certainty=0. 6286 (Affirmative) . 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
bacterial cytoplasm Certa±nty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 



Query: 10 RIPSSYTVLFI 1 IAIMAVLTWFIPAGAYETAK GGG VISGTYKTVASNPQGFF 61 

++PSS+T++F +1 + +LT+ IPAG ++ G G +++GTY+T+ P+GF 

Sbjct: 3 KMPSSFTIIFSLIVFVTILTYVIPAGKFDKEFRQIGDGPKREIIVAGTYQTIDRGPRGFL 62 

Query: 62 DILmPVRGMLGVEGTDGAIQVSFFILMVGGFLGVVNKTGALDTGIASVVRKNKGREKML 121 

+M + M +G + A +V F+L+VGG G++ KTGA+D GI S+4+K ++K+L 
Sbjct: 63 HPIMTILTAMS--KGMEHAAEVIIFVLIVGGAYGIIMKTGAIDAGIYSLIKKLGHKDKLL 120 

Query: 122 IAILIPLFALGGTTYGMGEETMAFYPLLIPVMIAVGFDSIVAVAIILIGSQIGCLASTIN 181 

I +L+ +F++GGT GM EET+ FY ++IP+++A+G+D++V VAII +G+ +G +AST+N 
Sbjct: 121 IPLLMFIFSIGGTVTGMSEETLPFYFVMIPLIVALGYDNWGVAIIALGAGVGTMASTVN 180 

Query: 182 PFATGVAADAAGVSIADGMIWRVIQWVILVGMSIWFVYNYASKIEEDPSKSLVADKEEEH 241 

PFATG+A+ A +S+ DG +R++ + I + ++I +V YAS+I++DPSKSLV K+ EH 
Sbjct: 181 PFATGIASAIASISLQDGFSFRIVLYFISILVAIIWOTASRIKKDPSKSLVYSKKNEH 240 

Query: 242 KELF-QLQNSGEDLNraQRNVLTIFTLTFVIMII^LIPlffiDFGIKFFTNIlTOLTTMPIL 300 

+ F + + S ED NV TF ++ L+ FG I + ++ L 

Sbjct: 241 YQYFVKNEISKED NVQNTLEFTFARKLVLLL FGFM ILFLVFSIVQL 286 
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Query: 301 GGVIGKTMGAFGTWYFPEITMLFIMMGVLVAIVYRMSEEDFFSSFLTGAGEFLGVAMICA 360 

G W+ E+TML++ + ++ A + R+ E + + +F+ G+ + A+I 

Sbjct: 287 G-- WWMQEMTMLYLGYAIISAFICRLGESEMWDAFVKGSESIiITAMjIIG 334 

Query: 361 IARGIQVIMNGGMITATILHLGETSLSGLSSQVFVILAYIFYLPMSFLIPSTSGLAGATM 420 

+ARG+ ++ + G+ITAT+L+ L L F+IL I + + F++PS+SG A TM 

Sbjct: 335 LARGVMIVCDDGLITATMUSRATNFLYNLPRPFFIILNEIIQIFIGFIVPSSSGHASLTM 394 

Query: 421 GI^PLGQFSNVPAHLVITAFQSASGIIMISPTSAIVMGALALGRVDLGTWWKFIGKFI 480 

IMA.PL F ++ V+ A Q++SG++N+I+PTS ++M L + ++ GTW+KF+ 
Sbjct: 395 PIMAPLADFLSIGRSSWIAMQTSSGLINLITPTSGVIMAVLGISKLSYGTWFKFVLPLF 454 

Query: 481 VMVMLVSVLLLWATF 496 

Sbjct: 455 IIEFFISILVIIANVY 470 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/542 (29%) , Positives = 274/542 (50%) , Gaps = 92/542 (16%) 

Query: 11 KRSKGFRMPGAFTILFILTIFSVlATWWIPAGSYSKLQFDTASSKLVVTDPNGKrVHVPA 70 

++ +GFR+P ++T+LFI+ + TW+IPAG+Y +TA 
Sbjct: 4 EKKRGFRIPSSYTVLFI I IAIMAVLTWFI PAGAY ETAKG 42 

Query: 71 TQTQLDKMNVKIKIKEFTSGAISKPVSVPNTYKRLKQNPAGIGSVTTSMVNG TI 124 

G IS TYK +NPG + +VG T 

Sbjct: 43 GGVIS GTYKTVASNPQGFFDILMAPWGMLGVEGTD 78 

Query: 125 EAVDI^FIMVLGGMIGVVRKSGAFESGLLALTKKTKGREFLLIFLVSLLMVLGGTIjCGI 184 

A+ + FI+++GG +GW K+GA ++G+ ++ +K KGRE +LI ++ L LGGT G+ 
Sbjct: 79 GAIQVSFFILMVGGFLGVVNKTGALDTGIASVVRKNKGREKMLIAILIPLFALGGTTYGM 138 

Query: 185 EEEAVAFYPILVPIFLAMGYDSIICVGAIFLASSVGTSFSTINPFSSVIASHAAGISFTE 244 

EE +AFYP+L+P+ +A+G+DSI+ V I + S +G STINPF++ +A++AAG+S + 
Sbjct: 139 GEETMAFYPLLIPVMIAVGFDSIVAVAIILIGSQIGCLASTINEFATGVAADAAGVSIAD 198 

Query: 245 GLSVffiTAGCIAGAIFVVVYLHWYAKKIKANPEFSYSYEDRVEFNAKWGMTTNHTPSLFTI 304 

G+ WR + + +++ YA KI+ +P S D+ E + + N L 

Sbjct: 199 GMIWRVIQWVILVGMSIWFVYNYASKIESDPSKSL-VADKEEEHKELFQLQNSGEDL-NK 256 

Query: 305 RQKIILSLFVISFPLMV W GVMSQ GWWF 331 

RQ+ +L++F ++F +M+ W GV+ + W+F 

Sbjct: 257 RQRNVLTIFTLTFVIMILSLIPVffiDFGIKFFTNIirrWLTTMPILGGVIGKTMGAFGTWYF 316 

Query: 332 PTMASSFIiAITIIIMFLTATGANGIGERDWCEFVWGASSLVGVSLIIGLARGINIILSQ 391 

p + F+ + +++ + + E D F+ GA +GV+4I +ARGI +I++ 

Sbjct: 317 PEITMLFIMMGVLVAIVYR MSEEDFFSSFLTGAGEFLGVAMICAIARGIQVIMNG 371 

Query: 392 GYISDTMLYTASKLASHVSGSVFIIVMMFIYF^GFWPSSSGLAVISMPILAPLADTVG 451 

G 1+ T+L+ S +S VF+I+ Y + F++PS+SGLA +M I+APL 

Sbjct: 372 GMITATILHLGETSLSGLSSQVFVILAYIFYLPMSFLIPSTSGLAGATMGIMAPLGQFSN 431 

Query: 452 IPRSWVMAYQFGQYAMLFIAPT-GLWATLQMLDMKYSHVnjKFWPWLFLLIFGGGLLVL 512 

+P +V+ A+Q + ++PT 4-VM L + + W KF+ ++ +++ LLV+ 

Sbjct: 432 VPAHLVITAFQSASGILNMISPTSAIVMGAIJVI^RvTJLGTWWKFIGKFIVMVM 493 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1056 

A DNA sequence (GBSxll29) was identified in S.agalactiae <SEQ ID 3255> which encodes the amino 
acid sequence <SEQ ID 3256>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
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» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood =-10.83 Transmembrane 25 - 
INTEGRAL Likelihood =-10.46 Transmembrane 153 - 



Final Results 

bacterial membrane Certair.ty=0 . 5331 (Affirmative) < succ: 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13183 GB:Z99110 similar to two-component sensor histidine 
kinase [YkoG] [Bacillus subtilis] 

119/446 (26%) , Positives = 212/446 (46%) , Gaps = 18/446 (4%) 





I dent; 


ities 


15 


Query: 
Sbjct: 


17 "i 

5 1 


20 


Query: 


11 C 




Sbjct: 


65 I 




Query: 


128 I 


25 


Sbjct: 


125 I 






184 I 


30 


Sbjct: 
Query: 


184 1 
244 I 




Sbjct: 


244 1 


35 


Query: 


303 I 




Sbjct: 


304 I 


40 




363 I 




Sbjct: 


362 I 




Query: 


423 ( 


45 


Sbjct: 


421 ( 



- -RTFQYVDI 127 



+ +++ V+RL+ E LF L I+L + + G L+ +R PI + 



+TP+T+I S S K + E +ES E IH +++ MKKL QLL L K+ 



+K+L L NAIK++ 



S++D G+GI + ++ RFY+ D AR + 4 G GLGLS+ K 



v V SKP G+ T+ F 



There is also homology to SEQ ID 1 178. 

SEQ ID 3256 (GBS77) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 21 (lane 2; MW 78.5kDa) and in Figure 28 (lane 2; MW 78.5kDa). 

50 GBS77-GST was purified as shown in Figure 1 95, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1057 

A DNA sequence (GBSxlBO) was identified in S.agcdactiae <SEQ ID 3257> which encodes the amino 
55 acid sequence <SEQ ID 3258>. This protein is predicted to be CopR protein (tcrA). Analysis of this protein 
sequence reveals the following: 

Possible site: 33 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3963 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MK+LWEDE + + L + + VD +NG + F YD+IILDVM+P +DG+ 



Q++ D IQ+ D+ ++LS ++ R I LT+KE+ +LE AR R +VL R I VW 



+SN+IDV 1+ LR K+D+ + LI+T RG+GYV+ 
3DSNVIDVAIRRLRAKIDDGFEVKLIQTVRGMGYVL 221 

There is also homology to SEQ ID 3260. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 105$ 

A DNA sequence (GBSxll31) was identified in S.agalactiae <SEQ ID 3261> which encodes the amino 
acid sequence <SEQ ID 3262>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>» Seems to have no N-terminal signal sequence 

Likelihood = -3.45 Transmembrane 18 - 34 ( 16 - 36) 





1 


Sbjct: 


1 






Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sb j Ct : 


181 



Final Results 

bacterial membrane — Certainty=0 . 2381 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) suco 

A related GBS nucleic acid sequence <SEQ ID 10281> which encodes amino acid sequence <SEQ ID 
10282> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3262 (GBS78) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 4; MW 23.8kDa). 

The GBS78-GST fusion product was purified (Figure 194, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 317), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1059 

A DNA sequence (GBSxll32) was identified in S.agalactiae <SEQ ID 3263> which encodes the amino 
5 acid sequence <SEQ ID 3264>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have an uncleavable M-term signal seq 

INTEGRAL Likelihood =-11.04 Transmembrane 15 - 31 ( 6 - 35) 
INTEGRAL Likelihood = -1.28 Transmembrane 51 - 67 ( 51 - 67) 

10 

Final Results 

bacterial membrane — Certainty=0 .5416 (Affirmative) < suoo 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3264 (GBS79) was expressed in E.coli as a GST-fusion product. GBS79d was expressed in E.coli 
as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 154 (lane 17 & 18; 
20 MW 51kDa), in Figure 155 (lane 17; MW 51kDa) and in Figure 187 (lane 13; MW 51kDa). GBS79d was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
155 (lane 2-4; MW 26kDa) and in Figure 183 (lane 5; MW 26kDa). Purified GBS79d-GST is shown in 
Figure 243, lane 2. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1060 

A DNA sequence (GBSxll33) was identified in S.agalactiae <SEQ ID 3265> which encodes the amino 

acid sequence <SEQ ID 3266>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5326 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10279> which encodes amino acid sequence <SEQ ID 
10280> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:AAG20974 GB:AE005164 Vng6349c [Halobacterium sp. NRC-1] 

Identities = 97/358 (27%) , Positives = 163/358 (45%) , Gaps = 20/358 (5%) 

Query: 35 DPQIIKLTTRANIAIGTYEGFLESIINPMLLISPLLSQEAVLSSKLEGTHATLKDLLNYE 94 
D + A +G G + P +L + LL +EA+ S+++EG L + E 

45 Sbjct: 70 DDDFYETLADATFWLGKLSGVSLELDFPPVLYTSLLRKEAMESAEIEGADVDYDALYSLE 129 

Query: 95 AGNKVDIERDELHEII NYRKALFYALENISTINNIDSKGLPLSNRIIKEMHKIL 148 

D RDE E + R+ L Y 1+ +D+ G L+ ++ ++H+ 1 
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Sbjct: 


130 


Query: 


149 


Sbjct: 


188 


Query: 


205 


Sbjct: 


243 


Query: 


265 


Sbjct: 


303 


Query: 


325 


Sbjct: 


361 



V A+ H QFE IHP+ DGNGR+GRIiLI L LY +LL P Y+S Y R+++ Y+ 



^ W -H-+EG+ AESt+J 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1061 

A DNA sequence (GBSxll34) was identified in S.agalactiae <SEQ ID 3267> which encodes the amino 
acid sequence <SEQ ID 3268>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 4370 (Affirmative) • 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

RGD motif : 46-48 

The protein has no significant homology with any sequences in the GENPEPT 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3268 (GBS299) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 58 (lane 2; MW 62.2kDa) and in Figure 60 (lane 4; MW 62.2kDa). 

GBS299-GST was purified as shown in Figure 207 (lane 4) and Figure 225 (lanes 2-3). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1062 

A DNA sequence (GBSxll35) was identified in S.agalactiae <SEQ ID 3269> which encodes the amino 
acid sequence <SEQ ID 3270>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4176 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1063 

A DNA sequence (GBSxll36) was identified in S.agalactiae <SEQ ID 3271> which encodes the amino 
acid sequence <SEQ ID 3272>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
10 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1789 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1064 

A DNA sequence (GBSxll37) was identified in S.agalactiae <SEQ ID 3273> which encodes the amino 
acid sequence <SEQ ID 3274>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3748 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1065 

A DNA sequence (GBSxll38) was identified in S.agalactiae <SEQ ID 3275> which encodes the amino 
acid sequence <SEQ ID 3276>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1638 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=o . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12294 GB:Z99106 similar to transposon protein [Bacillus subtilis] 
Identities = 84/291 (28%) , Positives = 138/291 (46%) , Gaps = 6/291 (2%) 

5 Query: 6 MLDYlAVTIKGIAPDDVIEKILlLPKDKFVLNETOINKyQRHYSFSEIKVYFNKDWQSKM 65 

M4DY+ V+ K D +IE++L L KD + G Y Y IKV+++ ++ 

Sbjct: 31 MVDYIRVSFKTHDVDRIIEEVLHLSKDFMTEKQSGFYGYVGTYELDY1KVFYSAPDDNR- 89 



Query: 


66 


GVFIELRGQGCRQYEEYMENNVIOTWVTLMKRISECHSNVTIUjDIMIDIFDDSLSVPLIYS 


125 






GV IE+ GQGCRQ+E ++E W + + + TR D+A D S+P + 




Sbjct: 


90 


GVLIEMSGQGCRQFESFLECRKKTWYDFFQDCMQQGGSFTRFDIAIDDKKTYFSIPELLK 


149 


Query: 


126 


YCKKQLCISTAKTFDYHEI<SLLENGEKYGE1WTIGTOGTQQW-CTYNKLLEQKLDQELPN 


184 






+K CIS + D++ L +G G + G + ++ + CYK EQ +P 




Sbjct: 


150 


KAQKGECISRFRKSDFNGSFDLSDGITGGTTIYFGSKKSEAYLCFYEKNYEQAEKYNIPL 


209 


Query: 


185 


TPL - SWTRAELRCWQEKANLLAKQI KEGRPLKEI YFEVINGHYRFVSPRDKDSNRWRRKT 


243 






L W R ELR E+A + + + + Ii I ++IN + RFV D++ R KT 




Sbjct: 


210 


EELGDW1TOYELRLKNERAQVAIDALLKTKDLTLIAMQIIJ3MYVRFVD-ADENITREHWKT 


268 




244 


VKMKIDYLETQEKTVLSVKRTKPTLKRSEKWTEKQVSRTLGKLYVAKAESH 294 








+W+D++ + L VK K ++S W + T+ V +A+ H 




Sbjct: 


269 


SLFWSDFIGDVGRLPLYVKPQKDFYQKSRNWLRNSCAPTM- - KMVLEADEH 317 





25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1066 

A DNA sequence (GBSxll39) was identified in S.agalactiae <SEQ ID 3277> which encodes the amino 
30 acid sequence <SEQ ID 3278>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 
Possible site: 58 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 1914 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB70622 GB:AJ243106 integrase [Streptococcus thermophilus] 



Identities 


5 = 135/474 (28%) , Positives = 233/474 (48%) , Gaps = 68/474 (14%) 


Query: 


20 


KAGWLVKFAMRFTHPITKKSHIQCYLSTGASKGKFTTKATPSICKLPSGKERLLVSDIKNT 


79 






K G + VKF F + +T K ++ LS W+T +KK +GK +L S 




Sbjct: 


19 


KTGYIEVKFRTYFNNQLTNK-RREILSD WYTIV- - -NKKDTTGKIKL- -SPQIKA 


67 




80 


QLITQVTQEmKLVnDYlAELMGIKPKKAKKIiTLEEIAKPFDKEGNFYGKAFKAWH- - - 


136 






+ ++ ++ ITK+ ++ ++ K +TL+E+ + WH 




Sbjct: 


68 




108 




137 


-ERVKPANNTLKTRVTIYNRYIEPNFDTRMSITKFAFMTDEIQNLIN ASSMHMAR 


190 






ER A TL Y +1 + SI K + I+NL++ + +A+ 




Sbjct: 


109 


VERQLVAPKTLAGEDGRYRNHITKQIP-KNSILK-NIPSSLIKNLLDNLYPIGNHKRLAQ 




Query: 


191 


KLHIYLKMIFDWSVENGQITLTQDPIASNKVKRRVLTKSEEQDK-KREDIAEKYLEASEV 








+ L 1+ +++ + 1+ Q+P+ + R+ L S+E D+ K+ DI ++YLE+ E+ 




Sbjct: 


167 


GVKSDLTSIYKFAILHDYISPDQNPMPYISIGRKGL--SDELDRLIOCSDIEDQYLESWEL 


224 
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Query: 250 NITORLIESWTNRPDNQLIADVLRMIFLTGMRPSE^ 309 

VL ++ + N+ A + LTGMR EVLGL E+ +DF K V RA+ 

Sbjct: 225 KEVLSIVRKY NEQYARIFEFQALTGMRIGEVLGLKEEAIDFNKNIASVIRTRATH 279 

Query: 310 NKSDDMMEMOT,DEKERyRADLKTfCESVRTIPMSPEVEKILRHYIDRNI<FQAQFSPrYQD 369 

+ + 4- Y ++K +S R + +S +IL+ 1+ N +F+P Y+D 

Sbjct: 280 GGASE DSYEGNVKNLQSYRNVQLSKRAIEIIjKEEIEIaNHQHIRFNPDYKD 329 

Query: 370 LGYIjFTRTYIRAGNRQGSPLYHNELSQFLRGGSSQSAKYNKKAGKPYK DIDSFLDFG 426 

G++FT I + G+PL+++ L+ FL SS++ K N+ G P + DID+ L F 
Sbjct: 330 NGWIFTSKSIHKPDYNGTPI^YSVi™FL--NSSENGI<LN^-GNPRRAGIDIDNKLSFK 386 

Query: 427 RPIHVIPHMFRHSFISIMASEGIDLPTIREFVGHSEDSKEIERVYLHVIKKQKD 480 

+ H+ H+FRH+ IS +A +G+ h I++ VGHS S+ + +YLH+ KK KD 
Sbjct: 387 K- -HITTHIFRHTHISFIiAEQGVPLEAIQDRVGHSRGSR-VTEIYLHITKKTKD 437 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3279> which encodes the amino acid 
sequence <SEQ ID 3280>. Analysis of this protein sequence reveals the following: 

: N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5203 (Affirmative) < 3ucc> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/357 (22%), Positives = 155/357 (42%), Gaps = 52/357 (14%) 

Query: 135 WHERWPftNNTLKTRVTIYNRYIEPNFDTRMSITKFAFMTDEIQNLINA- - SSMHMARNL 192 

W K+T + R+ D+IK T +Q++I+ S + 
Sbjct: 73 WEHHQKSLKSTSVRSLDFRIRELRNLIDPEVMIAKIT--TKYLQSIIDKIPGSYDKRKRA 130 

Query: 193 HIYLKMIFDWSVENGQITLTQDPIASNKVKRRVLTKSEEQDKKREDIAEKYLEASEVNHV 252 

LK FD+++ ++4 +P+ S ++ + V T K ED+A+K+LE E+ 

Sbjct: 131 RQLLKQTFDYAIALEYVSI - -NPVISTQLAKFVKTI KDFEDVAQKFLEKDELK- - 181 

Query: 253 LRLIESWTimPDNQLIADVLRMIFLTGMRPSEVTjGI^DMLDFEiaOTIKVHWQRASKNKS 312 

RL++ R + +A+ +LGREL4D++ I++H 
Sbjct: 182 -RLLDEMYRRKGSIKMAYLAEFMSLNGCRIGEAIAIQPD--NIKNDI1EIH 229 

Query: 313 DDMMEALI^DEKERYRADLKTKESVRTIPMSPETOKILRHYIDRNKFQAQFSPTYQDLGY 372 

++ + + + KT S R ++ ++I++ + N + +P Y+D+GY 
Sbjct: 230 -GTLDYTSNGYRNAIKTTPKTNSS^ErLITKREKEIIQDILKINMEKNTNPNYKDMGY 288 

Query: 373 LFTRTYIRAGNRQGSPLYHNELSQFLRG3SSQSAKYNKKAGKPYKDIDSFLDFGRPIHVI 432 

+F +R G P+ N L+ +R NK+ KP + + 
Sbjct: 289 IFI SRNGVPIQDNALNTSIRAA NKRLEKPIQK ELT 323 

Query: 433 PHMFRHSFISIMASEGIDLPTIREFVGHSEDSKEIERVYLHVIKKQKDTMRGAVEKL 489 

H+FRH+ +S +A + L TI + VGH+ DSK +++Y HV K K+ + + +L 
Sbjct: 324 SHIFRHTLVSRLAENKVPLKTIMDRVGHA-DSKTTCiJIYTHVTKSMKNEVTOIIjNRL 379 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1067 

A DNA sequence (GBSxll40) was identified in S.agalactiae <SEQ ID 3281> which encodes the amino 
acid sequence <SEQ ID 3282>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0.3023 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10277> which encodes amino acid sequence <SEQ ID 
10278> was also identified. 

The protein has homology with the following sequences in the GENPEPT 



Query: 36 ^TYSDKNELKEEVLKSYKKYIAEFNDIP3KLKDLRIDEVDRTPAENLAYQVGWTTLILK 95 

MR Y+ K ELKEE+ K Y+KY AEF I E KD +++ VDRTP+ENL+YQ+GW L+L+ 
Sbjct: 1 MREYTSKKELKEEIEKKYEKYDAEFETISESQKDEKVETVDRTPSENLSYQLGWVNLLLE 60 

Query: 96 WESDEQSGLEVKTPTETFKWNQLGELYQHFTETYASLTIKELTAQLNDNUDAIGNMIDSM 155 

WE+ E +G V+TP +KWN LG LYQ F + Y +IKE A+h + V+ + I ++ 
Sbjct: 61 WEAKEIAGYNVETPAPGYKWNNLGGLYQSFYKKYG1YSIKEQRM1REAVNEVYKWISTL 120 

Query: 156 SDEVLFKPHMRNWADSATKNAVWEVYKFIHINTVAPFGTFRTKIRKWKIW 205 

SD+ LF+ R W AT A+W VYK+IHINTVAPF FR KIRKWK++ 
Sbjct: 121 SDDELFQAGNRKW ATTKAMWPVYKWIHINTVAPFTNFRGKIRKWKRL 167 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1068 

A DNA sequence (GBSxll41) was identified in S.agalactiae <SEQ ID 3283> which encodes the amino 
acid sequence <SEQ ID 3284>. This protein is predicted to be 50S ribosomal protein subunit L33-related 
protein. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results -. 

bacterial cytoplasm Certainty=0 . 5420 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB66692 GB:U89998 50S ribosomal protein subunit L33 
[Lactococcus lactis subsp. cremoris] 
Identities = 43/49 (87%) , Positives = 46/49 (93%) 

45 Query: 1 MRVNITLEHKESGERLYLTSKNKRNTPDRLQLKKYSPKLRKHVVFTEVK 49 

MRVNITLEHKESGERLYLT KNKRNTPD+L+LKKYS KLRKHV+F EVK 
Sbjct: 1 MRVNITLEHKESGERLYLTQKNKRNTPDKLELKKYSKKLRKHVIFKEVK 49 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3285> which encodes the amino acid 
50 sequence <SEQ ID 3286>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 
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bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 48/49 (97%) , Positives = 48/49 (97%) 

Query: 1 I^VNITLKHKESGERLYLTSK3^KSOTPDRLQLKKYS?KLRKKWFTEVK 49 

^VNIT3jEHKESGERLYLTSKNI<RNTPDRLQLKKYSPKIiRKHV ftevk 
Sbjct: 1 ^TOITLEHKESGERLYLTSKNKRNTPDRLQLKKYSPKLRKHVTFTEVK 49 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1069 

A DNA sequence (GBSxll42) was identified in S.agalactiae <SEQ ID 3287> which encodes the amino 
acid sequence <SEQ ID 3288>. This protein is predicted to be 50S ribosomal protein subunit L32-related 
protein. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm --- Certainty=0. 3577 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=Q . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB66691 GB:U89998 50S ribosomal protein subunit L32 
[Lactococcus lactis subsp. cremoris] 
Identities = 44/53 (83%) , Positives = 48/53 (90%) 

30 Query: 1 MAKPARHTSKRKRNKRRTHYKLTAPSVQFDETTGDYSRSERVSLKGYYKGRKI 53 

MA PARHTS AK+N+RRTHYKLTAP+V FDETTGDY SHRVSLKGYYKGRK+ 
Sbjct: 1 MAVPARHTSSAKKNRRRTHYKLTAPTVTFDETTGDYRHSHRVSLKGYYKGRKV 53 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3289> which encodes the amino acid 
35 sequence <SEQ ID 3290>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 5148 (Affirmative) < suco 

bacterial membrane -— Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 38/39 (97%) , Positives = 39/39 (99%) 

Query: 22 LTAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 60 

+TAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 
Sbjct: 1 MTAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 39 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1070 

A DNA sequence (GBSxll44) was identified in S.agalactiae <SEQ ID 3291> which encodes the amino 
acid sequence <SEQ ID 3292>. This protein is predicted to be histidyl-tKNA synthetase (hisS). Analysis of 
this protein sequence reveals the following: 

5 Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4357 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10275> which encodes amino acid sequence <SEQ ID 
10276> was also identified. 
15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA78919 GB:Z17214 histidine--tRNA ligase [Streptococcus 
equisimilis] 

Identities = 327/404 (80%) , Positives = 361/404 (88%) 

20 Query: 32 WQYVEMVIRNLFKQYHYDEIRTPMFEHYEVISRSVGDTTDIVTKEMYDFHDKGDRHITLR 91 

WQYVE V R FKQYHY 'EIRTPMFEHYEVISRSVGDTTDIVTKEMYDF+DKGDRHITLR 
Sbjct: 1 WQYVEGVARETFKQYHYGEIRTPMFEHYEVISRSVGDTTDIVTKEMYDFYDKGDRHITLR SO 

Query: 92 PEGTAPVVRSYVENKLFAPEVQKPTKMWIGSMFRYERPQAGRLREFHQVGVECFGSMNP 151 
25 PEGTAPVWSYVENKLFAPEVQKP K+YYIGSMFRYERPQAGRLREFHQ+GVECFGS NP 

Sbjct: 61 PEGTAPWRSYVENKLFAPEVQKPVKLYYIGSMFRYERPQAGRLREFHQIGVECFGSANP 120 

Query: 152 ATDVETIAMGHHLFEDLGIKNVKLHLNSLGNPESRQAYRQALIDYLTPIREQLSKDSQRR 211 
ATDVETIAM +HLFE LGIK V LHUJSLGN SR AYRQALIDYL+P+R+ LSKDSQRR 
30 Sbjct: 121 ATDVETIAmYHLFERLGIKGVTLHLNSLGNAaSRAAYRQALIDYLSPMRDTLSKDSQRR 180 

Query: 212 I^NPLRVLDSKEPEDKIAVENAPSILDYLDESSQAHFDAVCHMLDAIJiIIPYIIDTl#IVR 271 

L+ENPLRVLDSKE EDK+AV NAPSILDY DE SQAHFDAV ML+AL IPY+IDTNMVR 
Sbjct: 181 LDENPLRVLDSKEKEDKIAVANAPSILDYQDEESQAHFDAVRSMLEALAIPYVIDTMvIVR 240 

35 

Query: 272 GLDYYNHT1 FEF I TEIEDNELT I CAGGRYDGLVSYFGGPETPAFGFGLGLERLLLI LDKQ 331 

GLDYYNHTIFEFITE++ +ELTICAGGRYDGLV YFGGP TP FGFGLGLERLLLILDKQ 
Sbjct: 241 GLDYYNHT1FEFITEVDQSELTICAGGRYDGLVEYFGGPATPGFGFGLGLERLLLILDKQ 300 

40 Query: 332 GISLPIENTIDLYIAVLGSEANIAALDLAQSIRHCjGFKVERDYLGRKIKAQFKSADTFNA 391 

G+ LP+E +D+YIAVLG++AN+AAL Ii Q+IR QGF VERDYLGRKIKAQFKSADTF A 
Sbjct: 301 GVELPVEEGLDWIAVLGADANVAAIjALTCjAIRRCjGFTVERDYLGRKIKAQFKSADTFKA 350 

Query: 392 KVIMTLGSSEVDSKEVGLKNNQTRQEVKVSFENIKTDFSSVLKQ 435 
45 KV++TLG SE+ + + LK+NQTRQE+ VSF+ I+TDF+S+ + 

Sbjct: 361 KWITLGESEIKAGQAVLKHNQTRQEMTVSFDQIQTDFASIFAE 404 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3293> which encodes the amino acid 
sequence <SEQ ID 3294>. Analysis of this protein sequence reveals the following: 

50 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3183 (Affirmative) < suco 

55 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 339/424 (79%) , Positives = 387/424 (90%) 
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Query: 13 MKLQKPKGTQDILPGESAKWQyVEKVIRNLFKQYHYDEIRTPMFEHYEVISRSVGDTTDI 72 

MKLQKPKGTQDILPG++AKWQYVE+V R+ P QY+Y EIRTPMFEHYEVI SRSVGDTTDI 
Sbjct: 1 MKLQKPKGTQDILPGDAAKWQYVESVARDTF3QYNYGEIRTPMFEHYEVISRSVGDTTDI 60 

Query: 73 VTKEMYDFHDKGDRHITLRPEGTAPWRSYVENKLFAPEVQKPTKMYYIGSMFRYERPQA 132 

VTKEMYDF+DKGDRHITLRPEGTAPWRSYVENKLFAPEVQKP K+YYIGSMFRYERPQA 
Sbjct: 61 VTKEMYDFYDKGDRHITLRPEGTAPVWSYVENKLFAPEVQKPVKLYYIGSMFRYERPQA 120 

Query: 133 GRLREFHQVGVECFGSNNPATDVET1AMGHHLFEDLGIKNVKLHLNSLGNPESRQAYRQA 192 

GRLREFHQ+GVECFG+ NPATDVETIAM +HLFE LGIK+V LHLNSLG+ PESR AYRQA 
Sbjct: 121 GRLREFHQIGVECFGAiOTPATDVETIAMAYHLFEKLGIKDVTLHLNSLGSPESRAAYRQA 180 

Query: 193 LIDYLTPIREQLSKDSQRRLNENPLRVLDSKEPEDKIAVENAPSIIjDYLDESSQAHFDAV 252 

LIDYLTP+R+QLSKDSQRRL+ENPLRVLDSKE EDKLAVE APSILDYLDE SQAHF+AV 
Sbjct: 181 LIDYLTPMRDQLSKDSQRRLDENPLR\7LDSKEKEDKLAVEKAPSILDYLDEESQAHFEAV 240 

Query: 253 CHMLDALNIPYIIDTNIIVRGIjDYYNHTIFEFITEIEDNEIjTICAGGRYDGLVSYFGGPET 312 

ML+AL+IPY+IDTNMVRGIjDYY+HTIFEFIT +E -f-LTICAGGRYD LV YFGGPET 
Sbjct: 241 ICDMLEALDIPYVIDTNMVRGLDYYSHTIFEFITSVEGSDLTICAGGRYDSLVGYFGGPET 3 00 

Query: 313 PAFGFGLGLERLLLILDKQGISLPIEtrriDLYIAVLGSEftNLAALDLAQSlRHQGFBCVER 372 

P FGFGLGLERLL+I++KQGI+LPIE +D+Y+AVLG AN AL+L Q+IR QGF ER 
Sbjct: 301 PGFGFGLGLERLLMIIEKQGITLPIETEMDIYLAVLGDGANSKALELVQAIRRQGFTAER 360 

DYLGRKIKAQFKSADTF AK++MTLG SEV++ + +KNN++RQEV+VSFE+ + T+F+++ 
Sbjct: 361 DYLGRKIKAQFKSADTFKAKLVMTLGESEVEAGKAVIKNNRSRQEVEVSFEDMMTNFANI 420 

Query: 433 LKQL 436 
+QL 

Sbjct: 421 SEQL 424 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1071 

A DNA sequence (GBSxll45) was identified in S.agalactiae <SEQ ID 3295> which encodes the amino 
acid sequence <SEQ ID 3296>. This protein is predicted to be aspartyl-tKNA synthetase (aspS). Analysis of 
this protein sequence reveals the following: 

40 Possible site: 29 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5124 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10273> which encodes amino acid sequence <SEQ ID 
10274> was also identified. 
50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14714 GB:Z99118 aspartyl-tRNA synthetase [Bacillus subtilis] 
Identities = 339/585 (57%), Positives = 432/585 (72%), Gaps = 9/585 (1%) 

Query: 20 RSMYAGRVRSEHIGTSITLKGWVGRRRDLGGLIFIDLRDREGIMQLVINPEEVSASVMAT 79 
55 R+ Y G + + IG S+TLKGWV +RRDLGGLIFIDLRDR GI+Q+V NP+ VS +A 

Sbjct: 4 RTYYCGDITEKAIGESVTLKGWVQKRRDLGGLIFIDLRDRTGIVQWFNPD-VSKEALAI 62 



Query: 80 AESLRSEFVIEVSGWTAREQA- -NDNLPTGETOIjiCVQELSIIjNTSKTTPFEIKDGIE-A 136 
AE +R+E+V+++ G V ARE+ N KL TG +E+ +++LN +KT PF I D E 
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Sbjct: S3 AEGIRNEYVIIJIQGKWAREEGTVNENLKTGAIEIHADGWVl^AAKTPPPAISDQAEEV 122 

Query: 137 M)DTRMRYRYLDLRRPEMIjENFKLRAKVTHSIRNYLDNLEFIDVETPMLTKSTPEGARDY 196 

++D R+++RYLDLRRP M + +LR VT ++R++LD F+D+ETP+LT STPEGARDY 
Sbjct: 123 SEDVRLKHRYLDLRRPAMFQTMQLRHN\TKAWSFLDENGFLDIETPILTGSTPEGARDY 182 

Query: 197 LVPSRVNQGHFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDDET 256 

LVPSRV++G FYALPQSPQ4- KQLLM +G +RYYQI +CFRDEDLR DRQPEFTQ+D+E 
Sbjct: 183 LVPSRVHEGEFYALPQSPQLFKQLLMVSGrERYYQIARCFRDEDLRADRQPEFTQIDIEM 242 

Query: 257 SFLSDQEIQDIVEGMIAKVMKDTKGLEVSLPFPRMS.YDDAMNNYGSDKPDTRFDMLLQDL 316 

SF+S ++I + E M+AKVM++TKG E+ LP PRM YD+AMN YGSDKPDTRFDMLL D+ 
Sbjct: 243 SFMSQEDIMSLAEEMMAK^/MRETKGEELQLPLPRMTYDEAMNKYGSDKPDTRFDMLLTDV 302 

Query: 317 TEIVKEVDFKVFSEA SWKAIWKDKADKYSRKNIDKLTEIAKQYGAKGLAWLKYA 372 

++IVK+ +FKVFS A WKAI VK A YSRK+ID L A YGAKG1AW+K 

Sbjct: 303 SDIVKDTEFKVFSSAVANGGWKAIWKGGAGDYSRKDIDALGAFAANYGAKGLAWVKVE 362 

Query: 373 DNTISGPVAKFL-TAIEGRLTEALQLENNDLILFVADSLEVANETLGALRTRIAKELELI 431 

+ + GP+AKF 4- +L EAL DL+LF AD EV +LGALR ++ KE LI 

Sbjct: 363 ADGVKGPIAKFFDEEKQSKLIEALDAAEGDLLLFGACQFEVVAASLGALRLKLGKERGLI 422 

Query: 432 DYSKFNFLWWDWPMFEWSEEEGRYMSAHHPFTLPTAETAHELEGDLAKVRAVAYDIVLN 491 

D FNFLWV+DWP+ E EEGR+ +AHHPFT+P E +E ++A AYD+VLN 

Sbjct: 423 DEKLFNFLWVIDWPLLEHDPEEGRFYAAHHPFTMPVREDLELIETAPEDMKAQAYDLVLN 482 

Query: 492 GYELGGGSLRINQKDTQERMFKALGFSAESAQEQFGFLLEAMDYGFPPHGGLAIGLDRFV 551 

GYELGGGS+RI +PCD QE+MF LGFS E A EQFGFLLEA +YG PPHGG+A+GLDR V 
Sbjct: 483 GYELGGGSIRIFEKDIQEKMFALLGFSPEEAAEQFGFLLEAFEYGAPPHGGIALGLDRLV 542 

Query: 552 MLLAGKDNIREVIAFPKNNKASDPMTQAPSLVSEQQLEELSLTVE 596 

MLLAG+ N+R+ IAFPK AS MT+AP VS+ QL+EL L+++ 
Sbjct: 543 MLLAGRTNLRDTIAFPKTASASCLMTEAPGEVSDAQLDELHLSIK 587 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3297> which encodes the amino acid 
sequence <SEQ ID 3298>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane --- Certainty=0. 000D (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 495/582 (85%) , Positives = 538/582 (92%) 



Query: 78 ATAESLRSEWIEVSGVVTAREQAITONLPTGEVELIWQELSILNTSKTTPFEIKDGIEAN 137 

ATAE LRSE+VIEV G V AR+QAND L TG VELKV L+ILNT+KTTPFEIKD +E + 
Sbjct: 78 ATAEPiRSEYVIEVEGFvFJ^QQANDKIATGIWELKVSALTILNTAKTTPFEIKDDVEVS 137 

Query: 138 DDTFJ^YRYLDLRRPEMLENFKLRAKVTHSIRNYLDNLEFIDVETPMLTKSTPEGARDYL 197 

DDTR+RYRYLDLRRPEMLENFKLRAKVTHSIRNYLD+LEFIDVETPMLTKSTPEGARDYL 
Sbjct: 138 DDTRLRYRYLDLRRPEMLENFKLRAK3/THSIRNYLDDLEFIDVETPMLTKSTPEGARDYL 197 

Query: 198 VPSRTOQ^HFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDLETS 257 

VPSRV+QGHFYALPQSPQITKQLLMNAGFDRYYQITOCFRDEDLRGDRQPEFTQVDLETS 
Sbjct: 198 VPSRVSQGHFYALPQSPQITKQLLMNAGFDRYYQIVICCFRDEDLRGDRQPEFTQVDLETS 257 

Query: 258 FLSDQEIQDIVEGMrAKVMKDTKGLEVSLPFPRMaYDDAMNNYGSDKPDTRFDMLLQDM 317 
FLS+QEIQDIVEGMIAKVMK+TK ++V+LPFPRM+YD AMN+YGSDKPDTRF+MLLQDLT 
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Sbjct: 258 FLSEQEIQDIVEGMIAKVMKETKEIDWLPFPEMSYDVAMNSyGSDKPDTRFEMLLQDLT 317 

Query: 318 EIVKEVDFKVFSE^SVVKAIWKIJKADK^SRKNIDKLTEIAKQYGAKGLAWLKYADNTIS 377 

VK DFKVFSEA VKAIWK AD+YSRK+IDKLTE AEQ+GAKGLAW+K D ++ 
Sbjct: 318 VTVKGNDFKyFSEAPAVKAIVWGI^RySIOTIDKLTEFAKQFGAKGl^WKVTDGQLA 377 

Query: 378 GPVAKFLTAIEGRLTEALQLENNDLILFVADSLEVANETLGALRTRIAKELELIDYSKFN 437 

GPVAKFLTAIE L+ L+L NDL+LFVAD+LEVAN TLGALR RIAK+L++ID S+FN 
Sbjct: 378 GPVAKFLTAIETELSSQLKLAEITOLVXjFVADTLEYANHTLGALRNRIAKDLDMIDQSQFN 437 

Query: 438 FLWVDWPMFEWSEEEGRYMSAHHPFTLPTAETAHELEGDIAKVRAVAYDIVLNGYELGG 497 

FLWWDWPMFEWSEEEGRYMSAHHPFTLPT E+AHELEGDLAKVRA+AYDIVLNGYELGG 
Sbjct: 438 FLWVDWPMFEWSEEEGRYMSAHHPFTLPrPESAHELEGDIAKVRfllAYDIVLNGYELGG 497 

Query: 498 GSLRINQKDTQERMFKALGFSAESAQSQFGFLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 557 

GSLRINQK+ QERMFKALGF+A+ A +QF3FLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 
Sbjct: 498 GSLRINQKEMQERMFKALGFTADEANDQFGFLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 557 

Query: 558 DNIREVIAFPKNNKASDPMTQAPSLVSEQQLEELSLTVESYE 599 

DNIREVIAFPKNNKASDPMTQAPSLVSE QLEELSL +ES++ 
Sbjct: 558 DNIREVIAFPKUNKASDPMTQAPSLVSENQLEELSLQIESHD 599 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1072 

A DNA sequence (GBSxll46) was identified in S.agalactiae <SEQ ID 3299> which encodes the amino 
acid sequence <SEQ ID 3300>. Analysis of this protein sequence reveals the following: 
Possible site: 54 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -8.44 Transmembrane 186 - 202 ( 182 - 205) 
INTEGRAL Likelihood = 

.5 - 131 ( 112 - 132) 
.1 - 157 ( 141 - 157) 
J.96 Transmembrane 43 - 59 ( 43 - 59) 



integral Likelihood = -3 
INTEGRAL Likelihood = -2 
INTEGRAL Likelihood = 



Final Results 

bacterial membrane --- Certainty=0 .4376 (Affirmative) <; succ. 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12952 GB:Z99109 alternate gene name: yuxA-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 104/275 (37%), Positives = 181/275 (65%), Gaps = 1/275 (0%) 



IP+ IL W K+G FT+++ ++V ++++F+ ++P+ +L+ D L+NA+FGG+I G+G + 



S+GG DI+++ + K + VG+ FI+NGII+L AGLL GW+ ALY++VT++V++ 



RV DAI T+ K+ AMIVT K + + 1+ + RG+T + A+G + +E+K ++I ++T 







39 




Sbjct: 


7 


50 


Query: 


99 




Sbjct: 


67 




Query: 


159 


55 








Sbjct: 


127 




Query: 


219 


60 


Sbjct: 


187 






279 
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RE D + ++ + DPKAF +-h +■ IGF D 
Sbjct: 246 RYELYDLEKIVKEWKAFTNIVQTTGIFGFFRKD 280 

A related DNA sequence was identified in S.pyogenes <SEQ ID 330 1> which encodes the amino acid 

sequence <SEQ ID 3302>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.47 Transmembrane 87 - 103 ( 86 - 106) 

INTEGRAL Likelihood = -4.94 Transmembrane 185 - 201 ( 182 - 203) 

INTEGRAL Likelihood = -1.59 Transmembrane 114 - 130 ( 113 - 130) 

INTEGRAL Likelihood = -1.12 Transmembrane 42 - 58 ( 42 - 58) 

INTEGRAL Likelihood = -0.32 Transmembrane 140 - 156 ( 140 - 156) 

Final Results 

bacterial membrane Certainty=0 .3187 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 37 AEKISASLLYGILSSIAWFFFQPGHVYSSGATGIAQVFSAL-SHRLLGYDFPIAFAFYL 95 
+++I +++YG L++++VN F P YSSG TG+AQ+ +AL SH LG +A ++ 
■ Sbjct: 8 SKRIVIAMVYGFLAAVS VNLFLIPAKTYSSGvTGVAQLLTALVSH- - LGGSLS VAALVFI 65 

Query: 96 INIPLLrLAWYKIGHQFTlFTFITVSMSSFFIQrMPQVT--LTTDPLINAIFGGLVMGMG 153 

+N+PLL+LAW+KI HQ+ IF+ + V S F++I+P + T+ A+FGG ++G+G 

Sbjct: 66 LNVPLLVLAWFKINHQYA1FSIVAVFTSVIFLKIIPVPVQPILTERFAGALFGGALIGLG 125 

Query: 154 IGTGLKSRISSGGTDIVSLTLRKRTGKDVGSLSLMVNGAIIAFAGILFGWQYALYSMVSI 213 

+G ++ S+GGTD++ + + TGK VG+++ ++NG 1+ AGI FGW ALYS+V I 
Sbjct: 126 VGLCFRAGFSTGGTDVIVTLVGRLTGKRVGAVNNVINGMIILAAGIFFGWGAALYSIVEI 185 

Query: 214 FVSSRVTDAIFTKQKMQATIVTSHPERVIHMIHKRLHRGVTSINDAEGTYKHEQKAVLI 273 

FVSS + D I+T+Q+K+ Tl T PE + + + +H G T + D G Y +++ +V++ 
SbjCt: 186 FVSSLLMDYIYTQQQK^mrriFTKQPEALXKRMREFIH-GATEL-DGTGLYTNQETSVIM 243 

Query: 274 TILTCEEYPEFKWLMLKTDPQAFVSVAENVRIIGRFVEDD 313 

T+++ + K ++ DP AFV++ + + GRF ++ 
Sbjct: 244 TWSKYDLTALKLWQDADPNAFVNIQSTMNLWGRFESNE 283 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 239/311 (76%) , Positives = 274/311 (87%) 

RRTPLEKKVKYIISVWAKKFGLLHTLKSISREKYAEKISASLLYGILSSVAVNFFFQPGH 63 
++T +KKVKY+IS AKK GLLH L+SISREKYAEKISASLLYGILSS+AVNFFFQPGH 
KKTTYKKKVKYVISRGAKKVGLLHALRSISREKYAEKISASLLYGILSSIAVNFFFQPGH 62 

vYSSGATGLAQVISAVSKHWFSFEIPVALAFYAINIPLLILSTOiaGHKFTIFTFITVTV 123 
VYSSGATGLAQV SA+S ++ P+A AFY INIPLLIL+W KIGH+FTIFTFITV++ 



SS FIQ+MPQ+TLTTDPLINAIFGGL+MG G+G KSRISSGGTDI+SLT+RK+TG+DV 



Query: 


4 


Sbjct: 


•3 




64 


Sbjct: 


63 


Query: 


124 


Sbjct: 


123 


Query: 


184 


Sbjct: 


183 


Query: 


244 


Sbjct: 


243 



I IH+ LHRGVT INDAEGTY HE-hKAVLITILT EE+ +FK+LMLK DP+AFVSVAEN 
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Query: 304 VHIIGRFVDDD 314 

V IIGRFV+DD 
Sbjct: 303 VRI IGRFVEDD 313 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1073 

A DNA sequence (GBSxl 147) was identified in S.agalactiae <SEQ ID 3303> which encodes the amino 
acid sequence <SEQ ID 3304>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.72 Transmembrane 156 - 172 { 155 - 174) 

INTEGRAL Likelihood = -3.03 Transmembrane 112 - 128 ( 110 - 129) 

INTEGRAL Likelihood = -2.34 Transmembrane 80 - 96 ( 79 - 96) 

INTEGRAL Likelihood = -1.49 Transmembrane 60 - 76 { 58 - 76) 

Final Results 

bacterial membrane Certainty=0 . 2487 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

sGP:BAB05397 GB:AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 113/278 (40%) , Positives = 192/278 (68%) , Gaps = 1/278 (0%) 

KTKIKETILIAFGVALYTFGFVKFNMANHLAEGGISGVTLI IHALFGVNPALSSLLLNIP 6 6 
+ K K +1 G A+++FG V FNM N+LAEGG +G+TLI++ +F +NPA+++L+LNIP 
RLKWKNIVFILIiGSAIFSFGLVYFmENNLAEGGFTGITLILYFMFQINPAVTNLraSIIP 63 

LFILGARILGKKSLLLTIYGTVLMSFFMWFWQQIP-VTVPLKNDMMLVAVAAGILAGTGS 12 5 
+ ++G +ILG+ +L+ TI GTV +S F+ +Q+ + +PL +DM L A+ AG+ GTG 
ILLIGWKILGRVTLIYTIIGTVSVSVFLEMFQRWKFMDIPLHDDMTLAALFAGVFVGTGL 123 

GLVF^YGATTGGADIIGRIWEKSGIKLGQTLLFIDAIVLTSSLVYINLQQMLYTLVASF 185 
G+VFR+G TTGG DII ++ G +G+T+ DA+V+ SSL+Y+N ++ +YTL+A F 

GIWRFGGTTGGVDIIAKLGFRYLGWSMGKTHFMFDAWIASSLIYLNYREAMYTLLAVF 183 

VFSQvLTNVENGGYTVRGMIIITKESESAAATILHEINRGVTFLRGQGAYSGREEIDVLYV 245 
+ ++ v+ ++ Y+ + II++ +E+ A TIL E+ RG T L+G+G+++G E ++LY 
IAAKVIDFIQQTAYSAKAAFIISEHTEAIADTILKEMERGATTLKGKGSFTGTEKEILYC 243 

ALNPSEVRDVKEIMADLDPDAFI SVINVDEVI SSDFKI 2 83 

+ +E+ +K ++ +DP AF++V +V +VI F + 
WGRNELIRLKSLVERIDPHAFVTVHDVQDVIGEGFTL 281 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3305> which encodes the amino acid 
sequence <SEQ ID 3306>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

cleavable N-term signal seq. 



Query: 


7 


Sbjct: 


4 




67 


Sbjct: 


64 


Query: 


126 


Sbjct: 


124 


Query: 


186 


Sbjct: 


184 


Query: 


246 


Sbjct: 


244 



INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 



Transmembrane 112 - 128 



Transmembrane 



■ Final Results 

bacterial membrane Certainty=0. 3060 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



WO 02/34771 



-1196- 



PCT/GB01/04789 



The protein has homology with the following sequences in the databases: 

>GP:BAB05397 GB : AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 116/276 (42%) , Positives = 182/276 (65%) , Gaps = 1/276 (0%) 

KLLKLFLIMjGVAIYTFGEVNFIMANALAEGtSVAGITLILHAHFGINPAYSSLLFNLPLF 6 8 
K + I LG AI++FG V FNM N LAEGG GITLIL+ F INPA ++L+ N+P+ 
KWKNIVFILLGSAIFSFGLWFNMENNLAEGGFTG1TLILYFMFQINPAVTHLVLNIPIL 65 



TI GTV +S F+ M+Q+ +++ L +DM L A+ AG+F G G GI 



VFR+G TTGG DII ++ G +G+T+ + DA+V+ +SL Y++ + +YTL+A F+ 





9 


Sb j ct : 


6 






Sbjct: 


66 




128 


Sbjct: 


126 


Query: 


188 


Sbjct: 


186 




240 


Sbjct: 





II++H+EA A IL E+ RG T LKG+G+++G -t 



+DP AF+++ DV +VI 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 206/286 (72%), Positives = 250/286 (87%) 

Query: 5 DLKTKIKETIL1AFGVALYTFGFVKFNMANHLAEGGISGVTLIIHALFGVNPALSSLLLN 64 

D TK+ + LIA GVA+YTFGFV FNMAN LAEGG++G+TLI+HA FG+NPA SSLL N 
Sbjct: 5 DK1TKLLKLFLIALGVAIYTFGFTOFNMANALAEGGVAGITL1LHAHFGINPAYSSLLFN 64 

Query: 65 IPLFILGARILGKKSLIiLTIYGTVLMSFFMWFWQQIPVTOPLKICIMMLVAVAAGIIjAGTG 124 

+PLFILGA+I GK+SL LTIYGTVLMS F+W WQ++P+ + L+NDMMLVAV AG+ +G G 
Sbjct: 65 LPLFILGAKIFGKRSIJU^TIYGTTLMSAFIP^QIU'PIELGLENDMMLVAVVAGLFSGIG 124 

Query: 125 SGLVFRYGATTGGADIIGRIVEEKSGIKLGQTLLFIDAIVLTSSLVYINLQQMLYTLVAS 184 

SG+VFRYGATTGG DIIGRI EEK G KLGQTLIi +DA+VLT+SL Y++L+ MLYTLVAS 
Sbjct: 125 SGIVFRYGATTGGTDIIGRIAEEKFGAKLGQTIiljLVDALVIiTASLTYVDIiKHMIjYTLVAS 184 

Query: 185 FVFSQVLTNVENGGYTVRGMI I ITKESESAAATILHEINRGVTFLRGQGAYSGREHDVLY 244 

FVFSQ+++ V+NGGYT+RGMIIITK SE+AA IL EINRGVT+L+GQGAYSG +++++Y 
Sbjct: 185 FVFSQMISWQNGGYTIRGMIIITKHSEAAACAILTEINRGVTYLKGQGAYSGNDYNIMY 244 

Query: 245 VAI^PSETTODVKEIMADLDPDAFISVINVDEVISSDFKIRRRNYDK 290 

V LNP+EVR4VK I+A LDPDAFI S+I +VDEVI SSDFKIRRRNYDK 
Sbjct: 245 VTLNPTEVREVKRHAGLDPDAFISIIDVDEVISSDFKIRRRNYDK 290 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1074 

A DNA sequence (GBSxll48) was identified in S.agalactiae <SEQ ID 3307> which encodes the amino 
acid sequence <SEQ ID 3308>. This protein is predicted to be BacB protein. Analysis of this protein 
sequence reveals the following: 

3 N- terminal signal sequence 

■ Final Results 

bacterial cytoplasm Cextainty=0. 4355 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MPSEKEILDALSKVYSEEVIQADDYFRQAI FELASQLEKEGMN- SLLATKIDSLINQYVL 59 

M ++E+LD LSK Y++ I + + +FE A +L N + K+ ++ ++Y+ 

Sbjct: 1 MDKQQELLDLLSKAYNDPKIl^YECSLKriKLFECAKRLTTlIETNIGEVCYKLSTIlSISEYLA 60 

Query: 60 THQFDAPKS I FDLSRLVKTKASHYKGTA 87 

H F+ PKSI +L + V + Y+G A 
Sbjct: 61 RHHFEMPKS 1 IELQKFVTKEGQKYRGWA 88 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3309> which encodes the amino acid 
sequence <SEQ ID 3310>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0. 2712 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/102 (97%) , Positives = 102/102 (99%) 

Query: 1 MPSEKEILDALSKVYSEEVIQADDYFRQAIFELASQLEKEGMKSLLATKIDSLINQYVLT 60 

MPSEKEILDALSKVYSE+VIQADDYFRQAIFE1ASQLEKEGM+SLLATKIDSLINQY+LT 
Sbjct: 7 MPSEKEILDALSKVYSEQVIQADDYFRQAIFELASQLEKEGMSSLLATKIDSLINQYILT 66 

Query: 61 HQFDAPKSIFDLSRLVKTKASHYKGTAISAIMLGSFLSGGPK 102 

HQFDAPKSIFDLSRLVKTKASHYKGTAISAIMLGSFLSGGPK 
Sbjct: 67 HQFDAPKSIFDLSRLVKTKASHYKGTAISAIMLGSFLSGGPK 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1075 

A DNA sequence (GBSxll49) was identified in S.agalactiae <SEQ ID 3311> which encodes the amino 
acid sequence <SEQ ID 3312>. This protein is predicted to be ArgS (argS). Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2522 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10271> which encodes amino acid sequence <SEQ ID 
10272> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF86984 GB:AF282249 ArgS [Lactococcus lactis subsp. lactis] 
Identities = 377/566 (66%), Positives = 464/566 (81%), Gaps = 5/566 (0%) 
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12 


Sbj ct : 


1 


Query: 


71 


Sbjct: 


61 


Query: 


131 


Sbjct: 


121 


Query: 


191 


Sbjct: 


181 


Query: 


251 


Sbjct: 


241 




311 


Sbjct: 


300 


Query: 


371 


Sbj ct : 


360 


Query: 
Sbjct- 


431 




489 


Sbjct: 


479 


Query: 


549 


Sbjct: 


539 



+PNIAKPFSIGHLRSTVIGDS+A I++K+GY P+KINHLGDWGKQFG+LI AYKK+G+E 



+ A+PIDELLKLYV+INAEA+ D VDEE R+WF K+E D EA +W+WF D SL+EF 



h VTFD + GE+FY+DKMD ++E LE+KNLL ESKGA +V+LEKY + +PALIK 



VPFG+VT+GG K STRKG+V+ LE 



D E+WEI+K +++FP 1+ KAADN+EPSIIAK+AI+LAQ FNKYYAH RIL++DA++ R 



LAL AT+ VLKE+LRLLGV AP 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3313> which encodes the amino acid 
sequence <SEQ ID 3314>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .1734 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 492/563 (87%) , Positives = 526/563 (93%) 

Query: 12 MDTKHLIASEIQKWPDMEQSTILSLLETPKNSSMGDLAFPAFSLAKTLRKAPQIIASDI 71 

MDTK LIASEI KWP++EQ I +LLETPKNS MGDLAFPAFSLAK LRKAPQ+ IAS + + 
Sbjct: 1 MDTKTLIASEIAKWPELEQDAIFNIiLETCPKNSDMGDLAFPAFSLAKVLRKAPQMIASEL 60 

Query: 72 AEQIKSDQFEKVEAVGPYVHFFLDKAAISSQ^KQVLSDGSAYATQNIGEGRNVAIDMSS 131 

AEQI QFEKV AVGPY+NFFLDKA ISSQVL+QV++ GS YA Q+ G+GRNVAIDMSS 
Sbjct: 61 AEQIDESQFEKWAVGPYINFFLDKAKISSQ^EQVITAGSDYAQQDEGQGRNVAIDMSS 120 

Query: 132 PNIAKPFSIGHLRSTVIGDSLAKIFDKIGYHPVKINHLGDWGKQFGMLIVAYKKWGNEEA 191 

PNIAKPFSIGHLRSTVIGDSLA+IF K+GY PVKINHLGDWGKQFGMLIVAYKKWG+E A 
Sbjct: 121 PNIAKPFSIGHLRSTVIGDSLAHIFAKMGYKPVKINHLGDWGKQFGMLIVAYKKWGDEAA 180 



